Authors: Yida Zhao, Chao Lou, Kewei Tu
Abstract: Syntactic Transformer language models aim to achieve better generalization
through simultaneously modeling syntax trees and sentences. While prior work
has been focusing on adding constituency-based structures to Transformers, we
introduce Dependency Transformer Grammars (DTGs), a new class of Transformer
language model with explicit dependency-based inductive bias. DTGs simulate
dependency transition systems with constrained attention patterns by modifying
attention masks, incorporate the stack information through relative positional
encoding, and augment dependency arc representation with a combination of token
embeddings and operation embeddings. When trained on a dataset of sentences
annotated with dependency trees, DTGs achieve better generalization while
maintaining comparable perplexity with Transformer language model baselines.
DTGs also outperform recent constituency-based models, showing that dependency
can better guide Transformer language models. Our code is released at
https://github.com/zhaoyd1/Dep_Transformer_Grammars.
Source: http://arxiv.org/abs/2407.17406v1