Authors: Tristan Cinquin, Marvin Pförtner, Vincent Fortuin, Philipp Hennig, Robert Bamler
Abstract: Laplace approximations are popular techniques for endowing deep networks with
epistemic uncertainty estimates as they can be applied without altering the
predictions of the neural network, and they scale to large models and datasets.
While the choice of prior strongly affects the resulting posterior
distribution, computational tractability and lack of interpretability of weight
space typically limit the Laplace approximation to isotropic Gaussian priors,
which are known to cause pathological behavior as depth increases. As a remedy,
we directly place a prior on function space. More precisely, since Lebesgue
densities do not exist on infinite-dimensional function spaces, we have to
recast training as finding the so-called weak mode of the posterior measure
under a Gaussian process (GP) prior restricted to the space of functions
representable by the neural network. Through the GP prior, one can express
structured and interpretable inductive biases, such as regularity or
periodicity, directly in function space, while still exploiting the implicit
inductive biases that allow deep networks to generalize. After model
linearization, the training objective induces a negative log-posterior density
to which we apply a Laplace approximation, leveraging highly scalable methods
from matrix-free linear algebra. Our method provides improved results where
prior knowledge is abundant, e.g., in many scientific inference tasks. At the
same time, it stays competitive for black-box regression and classification
tasks where neural networks typically excel.
Source: http://arxiv.org/abs/2407.13711v1