Authors: Hongjun Wang, Sagar Vaze, Kai Han
Abstract: Detecting test-time distribution shift has emerged as a key capability for
safely deployed machine learning models, with the question being tackled under
various guises in recent years. In this paper, we aim to provide a consolidated
view of the two largest sub-fields within the community: out-of-distribution
(OOD) detection and open-set recognition (OSR). In particular, we aim to
provide rigorous empirical analysis of different methods across settings and
provide actionable takeaways for practitioners and researchers. Concretely, we
make the following contributions: (i) We perform rigorous cross-evaluation
between state-of-the-art methods in the OOD detection and OSR settings and
identify a strong correlation between the performances of methods for them;
(ii) We propose a new, large-scale benchmark setting which we suggest better
disentangles the problem tackled by OOD detection and OSR, re-evaluating
state-of-the-art OOD detection and OSR methods in this setting; (iii) We
surprisingly find that the best performing method on standard benchmarks
(Outlier Exposure) struggles when tested at scale, while scoring rules which
are sensitive to the deep feature magnitude consistently show promise; and (iv)
We conduct empirical analysis to explain these phenomena and highlight
directions for future research. Code:
\url{https://github.com/Visual-AI/Dissect-OOD-OSR}
Source: http://arxiv.org/abs/2408.16757v1