Authors: Prateek Yadav, Colin Raffel, Mohammed Muqeeth, Lucas Caccia, Haokun Liu, Tianlong Chen, Mohit Bansal, Leshem Choshen, Alessandro Sordoni
Abstract: The availability of performant pre-trained models has led to a proliferation
of fine-tuned expert models that are specialized to a particular domain or
task. Model MoErging methods aim to recycle expert models to create an
aggregate system with improved performance or generalization. A key component
of MoErging methods is the creation of a router that decides which expert
model(s) to use for a particular input or application. The promise,
effectiveness, and large design space of MoErging has spurred the development
of many new methods over the past few years. This rapid pace of development has
made it challenging to compare different MoErging methods, which are rarely
compared to one another and are often validated in different experimental
setups. To remedy such gaps, we present a comprehensive survey of MoErging
methods that includes a novel taxonomy for cataloging key design choices and
clarifying suitable applications for each method. Apart from surveying MoErging
research, we inventory software tools and applications that make use of
MoErging. We additionally discuss related fields of study such as model
merging, multitask learning, and mixture-of-experts models. Taken as a whole,
our survey provides a unified overview of existing MoErging methods and creates
a solid foundation for future work in this burgeoning field.
Source: http://arxiv.org/abs/2408.07057v1