Dataset Distillation via Committee Voting

Authors: Jiacheng Cui, Zhaoyi Li, Xiaochen Ma, Xinyue Bi, Yaxin Luo, Zhiqiang Shen

Abstract: Dataset distillation aims to synthesize a smaller, representative dataset
that preserves the essential properties of the original data, enabling
efficient model training with reduced computational resources. Prior work has
primarily focused on improving the alignment or matching process between
original and synthetic data, or on enhancing the efficiency of distilling large
datasets. In this work, we introduce ${\bf C}$ommittee ${\bf V}$oting for ${\bf
D}$ataset ${\bf D}$istillation (CV-DD), a novel and orthogonal approach that
leverages the collective wisdom of multiple models or experts to create
high-quality distilled datasets. We start by showing how to establish a strong
baseline that already achieves state-of-the-art accuracy through leveraging
recent advancements and thoughtful adjustments in model design and optimization
processes. By integrating distributions and predictions from a committee of
models while generating high-quality soft labels, our method captures a wider
spectrum of data features, reduces model-specific biases and the adverse
effects of distribution shifts, leading to significant improvements in
generalization. This voting-based strategy not only promotes diversity and
robustness within the distilled dataset but also significantly reduces
overfitting, resulting in improved performance on post-eval tasks. Extensive
experiments across various datasets and IPCs (images per class) demonstrate
that Committee Voting leads to more reliable and adaptable distilled data
compared to single/multi-model distillation methods, demonstrating its
potential for efficient and accurate dataset distillation. Code is available
at: https://github.com/Jiacheng8/CV-DD.

Source: http://arxiv.org/abs/2501.07575v1

About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like these