Authors: Peng Xu, Wei Ping, Xianchao Wu, Zihan Liu, Mohammad Shoeybi, Bryan Catanzaro
Abstract: In this work, we introduce ChatQA 2, a Llama3-based model designed to bridge
the gap between open-access LLMs and leading proprietary models (e.g.,
GPT-4-Turbo) in long-context understanding and retrieval-augmented generation
(RAG) capabilities. These two capabilities are essential for LLMs to process
large volumes of information that cannot fit into a single prompt and are
complementary to each other, depending on the downstream tasks and
computational budgets. We present a detailed continued training recipe to
extend the context window of Llama3-70B-base from 8K to 128K tokens, along with
a three-stage instruction tuning process to enhance the model’s
instruction-following, RAG performance, and long-context understanding
capabilities. Our results demonstrate that the Llama3-ChatQA-2-70B model
achieves accuracy comparable to GPT-4-Turbo-2024-0409 on many long-context
understanding tasks and surpasses it on the RAG benchmark. Interestingly, we
find that the state-of-the-art long-context retriever can alleviate the top-k
context fragmentation issue in RAG, further improving RAG-based results for
long-context understanding tasks. We also provide extensive comparisons between
RAG and long-context solutions using state-of-the-art long-context LLMs.
Source: http://arxiv.org/abs/2407.14482v1