Authors: Guan Zhe Hong, Nishanth Dikkala, Enming Luo, Cyrus Rashtchian, Rina Panigrahy
Abstract: Large language models (LLMs) have shown amazing performance on tasks that
require planning and reasoning. Motivated by this, we investigate the internal
mechanisms that underpin a network’s ability to perform complex logical
reasoning. We first construct a synthetic propositional logic problem that
serves as a concrete test-bed for network training and evaluation. Crucially,
this problem demands nontrivial planning to solve, but we can train a small
transformer to achieve perfect accuracy. Building on our set-up, we then pursue
an understanding of precisely how a three-layer transformer, trained from
scratch, solves this problem. We are able to identify certain “planning” and
“reasoning” circuits in the network that necessitate cooperation between the
attention blocks to implement the desired logic. To expand our findings, we
then study a larger model, Mistral 7B. Using activation patching, we
characterize internal components that are critical in solving our logic
problem. Overall, our work systemically uncovers novel aspects of small and
large transformers, and continues the study of how they plan and reason.
Source: http://arxiv.org/abs/2411.04105v1