Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization

Authors: Yuhan Fu, Ruobing Xie, Xingwu Sun, Zhanhui Kang, Xirong Li

Abstract: Multimodal Large Language Models (MLLMs) are known to hallucinate, which
limits their practical applications. Recent works have attempted to apply
Direct Preference Optimization (DPO) to enhance the performance of MLLMs, but
have shown inconsistent improvements in mitigating hallucinations. To address
this issue more effectively, we introduce Hallucination-targeted Direct
Preference Optimization (HDPO) to reduce hallucinations in MLLMs. Unlike
previous approaches, our method tackles hallucinations from their diverse forms
and causes. Specifically, we develop three types of preference pair data
targeting the following causes of MLLM hallucinations: (1) insufficient visual
capabilities, (2) long context generation, and (3) multimodal conflicts.
Experimental results demonstrate that our method achieves superior performance
across multiple hallucination evaluation datasets, surpassing most
state-of-the-art (SOTA) methods and highlighting the potential of our approach.
Ablation studies and in-depth analyses further confirm the effectiveness of our
method and suggest the potential for further improvements through scaling up.

Source: http://arxiv.org/abs/2411.10436v1

About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like these