Authors: Geigh Zollicoffer, Minh Vu, Ben Nebgen, Juan Castorena, Boian Alexandrov, Manish Bhattarai
Abstract: This work presents an information-theoretic examination of diffusion-based
purification methods, the state-of-the-art adversarial defenses that utilize
diffusion models to remove malicious perturbations in adversarial examples. By
theoretically characterizing the inherent purification errors associated with
the Markov-based diffusion purifications, we introduce LoRID, a novel Low-Rank
Iterative Diffusion purification method designed to remove adversarial
perturbation with low intrinsic purification errors. LoRID centers around a
multi-stage purification process that leverages multiple rounds of
diffusion-denoising loops at the early time-steps of the diffusion models, and
the integration of Tucker decomposition, an extension of matrix factorization,
to remove adversarial noise at high-noise regimes. Consequently, LoRID
increases the effective diffusion time-steps and overcomes strong adversarial
attacks, achieving superior robustness performance in CIFAR-10/100, CelebA-HQ,
and ImageNet datasets under both white-box and black-box settings.
Source: http://arxiv.org/abs/2409.08255v1