Authors: Satchel Grant, Noah D. Goodman, James L. McClelland
Abstract: What types of numeric representations emerge in Neural Networks (NNs)? To
what degree do NNs induce abstract, mutable, slot-like numeric variables, and
in what situations do these representations emerge? How do these
representations change over learning, and how can we understand the neural
implementations in ways that are unified across different NNs? In this work, we
approach these questions by first training sequence based neural systems using
Next Token Prediction (NTP) objectives on numeric tasks. We then seek to
understand the neural solutions through the lens of causal abstractions or
symbolic algorithms. We use a combination of causal interventions and
visualization methods to find that artificial neural models do indeed develop
analogs of interchangeable, mutable, latent number variables purely from the
NTP objective. We then ask how variations on the tasks and model architectures
affect the models’ learned solutions to find that these symbol-like numeric
representations do not form for every variant of the task, and transformers
solve the problem in a notably different way than their recurrent counterparts.
We then show how the symbol-like variables change over the course of training
to find a strong correlation between the models’ task performance and the
alignment of their symbol-like representations. Lastly, we show that in all
cases, some degree of gradience exists in these neural symbols, highlighting
the difficulty of finding simple, interpretable symbolic stories of how neural
networks perform numeric tasks. Taken together, our results are consistent with
the view that neural networks can approximate interpretable symbolic programs
of number cognition, but the particular program they approximate and the extent
to which they approximate it can vary widely, depending on the network
architecture, training data, extent of training, and network size.
Source: http://arxiv.org/abs/2501.06141v1