Authors: Ruizhe Zhang, Yongxin Xu, Yuzhen Xiao, Runchuan Zhu, Xinke Jiang, Xu Chu, Junfeng Zhao, Yasha Wang
Abstract: By integrating external knowledge, Retrieval-Augmented Generation (RAG) has
become an effective strategy for mitigating the hallucination problems that
large language models (LLMs) encounter when dealing with knowledge-intensive
tasks. However, in the process of integrating external non-parametric
supporting evidence with internal parametric knowledge, inevitable knowledge
conflicts may arise, leading to confusion in the model’s responses. To enhance
the knowledge selection of LLMs in various contexts, some research has focused
on refining their behavior patterns through instruction-tuning. Nonetheless,
due to the absence of explicit negative signals and comparative objectives,
models fine-tuned in this manner may still exhibit undesirable behaviors in the
intricate and realistic retrieval scenarios. To this end, we propose a
Knowledge-aware Preference Optimization, dubbed KaPO, aimed at achieving
controllable knowledge selection in real retrieval scenarios. Concretely, we
explore and simulate error types across diverse context combinations and learn
how to avoid these negative signals through preference optimization methods.
Simultaneously, by adjusting the balance between response length and the
proportion of preference data representing different behavior patterns, we
enhance the adherence capabilities and noise robustness of LLMs in a balanced
manner. Experimental results show that KaPO outperforms previous methods for
handling knowledge conflicts by over 37%, while also exhibiting robust
generalization across various out-of-distribution datasets.
Source: http://arxiv.org/abs/2408.03297v1