Authors: Satchel Grant
Abstract: When can we say that two neural systems are the same? The answer to this
question is goal-dependent, and it is often addressed through correlative
methods such as Representational Similarity Analysis (RSA) and Centered Kernel
Alignment (CKA). What do we miss when we forgo causal explorations, and how can
we target specific types of similarity? In this work, we introduce Model
Alignment Search (MAS), a method for causally exploring distributed
representational similarity. The method learns invertible linear
transformations that align a subspace between two distributed networks’
representations where causal information can be freely interchanged. We first
show that the method can be used to transfer specific causal variables, such as
the number of items in a counting task, between networks with different
training seeds. We then explore open questions in number cognition by comparing
different types of numeric representations in models trained on structurally
different numeric tasks. We then explore differences between MAS vs preexisting
causal similarity methods, showing MAS to be more resistant to unwanted
exchanges. Lastly, we introduce a counterfactual latent auxiliary loss function
that helps shape causally relevant alignments even in cases where we do not
have causal access to one of the two models for training.
Source: http://arxiv.org/abs/2501.06164v1