Authors: Pierre Colonna D’Istria, Abdulrahman Altahhan
Abstract: In this paper, we introduce TreeCoders, a novel family of transformer trees.
We moved away from traditional linear transformers to complete k-ary trees.
Transformer blocks serve as nodes, and generic classifiers learn to select the
best child and route the sequence of tokens to a specific leaf. The selectors,
moved outside the transformer blocks, allow for the use of a variety of
architecture without further modifications. Furthermore, our proposed
architecture supports sparse node activation due to the logarithmic complexity
of a tree search. We validate our idea by testing a series of decoder-only tree
transformers, achieving competitive results across a diverse range of language
datasets. Our study demonstrates that the proposed tree transformer model
outperforms a size-equivalent linear transformer model 76\% of the time over a
wide range of tree architectures. Furthermore, our proposed model naturally
lends itself to distributed implementation.
Source: http://arxiv.org/abs/2411.07218v1