Using Language Models to Disambiguate Lexical Choices in Translation

Authors: Josh Barua, Sanjay Subramanian, Kayo Yin, Alane Suhr

Abstract: In translation, a concept represented by a single word in a source language
can have multiple variations in a target language. The task of lexical
selection requires using context to identify which variation is most
appropriate for a source text. We work with native speakers of nine languages
to create DTAiLS, a dataset of 1,377 sentence pairs that exhibit cross-lingual
concept variation when translating from English. We evaluate recent LLMs and
neural machine translation systems on DTAiLS, with the best-performing model,
GPT-4, achieving from 67 to 85% accuracy across languages. Finally, we use
language models to generate English rules describing target-language concept
variations. Providing weaker models with high-quality lexical rules improves
accuracy substantially, in some cases reaching or outperforming GPT-4.

Source: http://arxiv.org/abs/2411.05781v1

About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like these