Authors: Ping Yu, Weizhe Yuan, Olga Golovneva, Tianhao Wu, Sainbayar Sukhbaatar, Jason Weston, Jing Xu
Abstract: Training data quality is one of the most important drivers of final model
quality. In this work, we introduce a method for evaluating data integrity
based on the assumption that low-quality input prompts result in high variance
and low quality responses. This is achieved by measuring the rejected response
quality and the reward gap between the chosen and rejected preference pair. Our
method, Rejecting Instruction Preferences (RIP) can be used to filter prompts
from existing training sets, or to make high quality synthetic datasets,
yielding large performance gains across various benchmarks compared to
unfiltered data. Using Llama 3.1-8B-Instruct, RIP improves AlpacaEval2 LC Win
Rate by 9.4%, Arena-Hard by 8.7%, and WildBench by 9.9%. Using Llama
3.3-70B-Instruct, RIP improves Arena-Hard from 67.5 to 82.9, which is from 18th
place to 6th overall in the leaderboard.
Source: http://arxiv.org/abs/2501.18578v1