Authors: Peter Shaw, James Cohan, Jacob Eisenstein, Kenton Lee, Jonathan Berant, Kristina Toutanova
Abstract: We propose a new programming language called ALTA and a compiler that can map
ALTA programs to Transformer weights. ALTA is inspired by RASP, a language
proposed by Weiss et al. (2021), and Tracr (Lindner et al., 2023), a compiler
from RASP programs to Transformer weights. ALTA complements and extends this
prior work, offering the ability to express loops and to compile programs to
Universal Transformers, among other advantages. ALTA allows us to
constructively show how Transformers can represent length-invariant algorithms
for computing parity and addition, as well as a solution to the SCAN benchmark
of compositional generalization tasks, without requiring intermediate
scratchpad decoding steps. We also propose tools to analyze cases where the
expressibility of an algorithm is established, but end-to-end training on a
given training set fails to induce behavior consistent with the desired
algorithm. To this end, we explore training from ALTA execution traces as a
more fine-grained supervision signal. This enables additional experiments and
theoretical analyses relating the learnability of various algorithms to data
availability and modeling decisions, such as positional encodings. We make the
ALTA framework — language specification, symbolic interpreter, and weight
compiler — available to the community to enable further applications and
insights.
Source: http://arxiv.org/abs/2410.18077v1