Authors: Yingya Li, Timothy Miller, Steven Bethard, Guergana Savova
Abstract: The success of multi-task learning can depend heavily on which tasks are
grouped together. Naively grouping all tasks or a random set of tasks can
result in negative transfer, with the multi-task models performing worse than
single-task models. Though many efforts have been made to identify task
groupings and to measure the relatedness among different tasks, it remains a
challenging research topic to define a metric to identify the best task
grouping out of a pool of many potential task combinations. We propose a metric
of task relatedness based on task difficulty measured by pointwise V-usable
information (PVI). PVI is a recently proposed metric to estimate how much
usable information a dataset contains given a model. We hypothesize that tasks
with not statistically different PVI estimates are similar enough to benefit
from the joint learning process. We conduct comprehensive experiments to
evaluate the feasibility of this metric for task grouping on 15 NLP datasets in
the general, biomedical, and clinical domains. We compare the results of the
joint learners against single learners, existing baseline methods, and recent
large language models, including Llama 2 and GPT-4. The results show that by
grouping tasks with similar PVI estimates, the joint learners yielded
competitive results with fewer total parameters, with consistent performance
across domains.
Source: http://arxiv.org/abs/2410.12774v1