WebWalker: Benchmarking LLMs in Web Traversal

Authors: Jialong Wu, Wenbiao Yin, Yong Jiang, Zhenglin Wang, Zekun Xi, Runnan Fang, Deyu Zhou, Pengjun Xie, Fei Huang

Abstract: Retrieval-augmented generation (RAG) demonstrates remarkable performance
across tasks in open-domain question-answering. However, traditional search
engines may retrieve shallow content, limiting the ability of LLMs to handle
complex, multi-layered information. To address it, we introduce WebWalkerQA, a
benchmark designed to assess the ability of LLMs to perform web traversal. It
evaluates the capacity of LLMs to traverse a website’s subpages to extract
high-quality data systematically. We propose WebWalker, which is a multi-agent
framework that mimics human-like web navigation through an explore-critic
paradigm. Extensive experimental results show that WebWalkerQA is challenging
and demonstrates the effectiveness of RAG combined with WebWalker, through the
horizontal and vertical integration in real-world scenarios.

Source: http://arxiv.org/abs/2501.07572v1

About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like these