Authors: Hao Dong, Moru Liu, Kaiyang Zhou, Eleni Chatzi, Juho Kannala, Cyrill Stachniss, Olga Fink
Abstract: In real-world scenarios, achieving domain adaptation and generalization poses
significant challenges, as models must adapt to or generalize across unknown
target distributions. Extending these capabilities to unseen multimodal
distributions, i.e., multimodal domain adaptation and generalization, is even
more challenging due to the distinct characteristics of different modalities.
Significant progress has been made over the years, with applications ranging
from action recognition to semantic segmentation. Besides, the recent advent of
large-scale pre-trained multimodal foundation models, such as CLIP, has
inspired works leveraging these models to enhance adaptation and generalization
performances or adapting them to downstream tasks. This survey provides the
first comprehensive review of recent advances from traditional approaches to
foundation models, covering: (1) Multimodal domain adaptation; (2) Multimodal
test-time adaptation; (3) Multimodal domain generalization; (4) Domain
adaptation and generalization with the help of multimodal foundation models;
and (5) Adaptation of multimodal foundation models. For each topic, we formally
define the problem and thoroughly review existing methods. Additionally, we
analyze relevant datasets and applications, highlighting open challenges and
potential future research directions. We maintain an active repository that
contains up-to-date literature at
https://github.com/donghao51/Awesome-Multimodal-Adaptation.
Source: http://arxiv.org/abs/2501.18592v1