Can Unconfident LLM Annotations Be Used for Confident Conclusions?

Authors: Kristina Gligorić, Tijana Zrnic, Cinoo Lee, Emmanuel J. Candès, Dan Jurafsky

Abstract: Large language models (LLMs) have shown high agreement with human raters
across a variety of tasks, demonstrating potential to ease the challenges of
human data collection. In computational social science (CSS), researchers are
increasingly leveraging LLM annotations to complement slow and expensive human
annotations. Still, guidelines for collecting and using LLM annotations,
without compromising the validity of downstream conclusions, remain limited. We
introduce Confidence-Driven Inference: a method that combines LLM annotations
and LLM confidence indicators to strategically select which human annotations
should be collected, with the goal of producing accurate statistical estimates
and provably valid confidence intervals while reducing the number of human
annotations needed. Our approach comes with safeguards against LLM annotations
of poor quality, guaranteeing that the conclusions will be both valid and no
less accurate than if we only relied on human annotations. We demonstrate the
effectiveness of Confidence-Driven Inference over baselines in statistical
estimation tasks across three CSS settings–text politeness, stance, and
bias–reducing the needed number of human annotations by over 25% in each.
Although we use CSS settings for demonstration, Confidence-Driven Inference can
be used to estimate most standard quantities across a broad range of NLP
problems.

Source: http://arxiv.org/abs/2408.15204v1

About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like these