FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

Authors: Zhipei Xu, Xuanyu Zhang, Runyi Li, Zecheng Tang, Qing Huang, Jian Zhang

Abstract: The rapid development of generative AI is a double-edged sword, which not
only facilitates content creation but also makes image manipulation easier and
more difficult to detect. Although current image forgery detection and
localization (IFDL) methods are generally effective, they tend to face two
challenges: \textbf{1)} black-box nature with unknown detection principle,
\textbf{2)} limited generalization across diverse tampering methods (e.g.,
Photoshop, DeepFake, AIGC-Editing). To address these issues, we propose the
explainable IFDL task and design FakeShield, a multi-modal framework capable of
evaluating image authenticity, generating tampered region masks, and providing
a judgment basis based on pixel-level and image-level tampering clues.
Additionally, we leverage GPT-4o to enhance existing IFDL datasets, creating
the Multi-Modal Tamper Description dataSet (MMTD-Set) for training FakeShield’s
tampering analysis capabilities. Meanwhile, we incorporate a Domain Tag-guided
Explainable Forgery Detection Module (DTE-FDM) and a Multi-modal Forgery
Localization Module (MFLM) to address various types of tamper detection
interpretation and achieve forgery localization guided by detailed textual
descriptions. Extensive experiments demonstrate that FakeShield effectively
detects and localizes various tampering techniques, offering an explainable and
superior solution compared to previous IFDL methods.

Source: http://arxiv.org/abs/2410.02761v1

About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like these