Authors: Botao Yu, Frazier N. Baker, Ziru Chen, Garrett Herb, Boyu Gou, Daniel Adu-Ampratwum, Xia Ning, Huan Sun
Abstract: To enhance large language models (LLMs) for chemistry problem solving,
several LLM-based agents augmented with tools have been proposed, such as
ChemCrow and Coscientist. However, their evaluations are narrow in scope,
leaving a large gap in understanding the benefits of tools across diverse
chemistry tasks. To bridge this gap, we develop ChemAgent, an enhanced
chemistry agent over ChemCrow, and conduct a comprehensive evaluation of its
performance on both specialized chemistry tasks and general chemistry
questions. Surprisingly, ChemAgent does not consistently outperform its base
LLMs without tools. Our error analysis with a chemistry expert suggests that:
For specialized chemistry tasks, such as synthesis prediction, we should
augment agents with specialized tools; however, for general chemistry questions
like those in exams, agents’ ability to reason correctly with chemistry
knowledge matters more, and tool augmentation does not always help.
Source: http://arxiv.org/abs/2411.07228v1