Synthetic Data Privacy Metrics

Authors: Amy Steier, Lipika Ramaswamy, Andre Manoel, Alexa Haushalter

Abstract: Recent advancements in generative AI have made it possible to create
synthetic datasets that can be as accurate as real-world data for training AI
models, powering statistical insights, and fostering collaboration with
sensitive datasets while offering strong privacy guarantees. Effectively
measuring the empirical privacy of synthetic data is an important step in the
process. However, while there is a multitude of new privacy metrics being
published every day, there currently is no standardization. In this paper, we
review the pros and cons of popular metrics that include simulations of
adversarial attacks. We also review current best practices for amending
generative models to enhance the privacy of the data they create (e.g.
differential privacy).

Source: http://arxiv.org/abs/2501.03941v1