Authors: Yakir Oz, Gilad Yehudai, Gal Vardi, Itai Antebi, Michal Irani, Niv Haim
Abstract: Current methods for reconstructing training data from trained classifiers are
restricted to very small models, limited training set sizes, and low-resolution
images. Such restrictions hinder their applicability to real-world scenarios.
In this paper, we present a novel approach enabling data reconstruction in
realistic settings for models trained on high-resolution images. Our method
adapts the reconstruction scheme of arXiv:2206.07758 to real-world scenarios —
specifically, targeting models trained via transfer learning over image
embeddings of large pre-trained models like DINO-ViT and CLIP. Our work employs
data reconstruction in the embedding space rather than in the image space,
showcasing its applicability beyond visual data. Moreover, we introduce a novel
clustering-based method to identify good reconstructions from thousands of
candidates. This significantly improves on previous works that relied on
knowledge of the training set to identify good reconstructed images. Our
findings shed light on a potential privacy risk for data leakage from models
trained using transfer learning.
Source: http://arxiv.org/abs/2407.15845v1