Authors: Daniel Ekpo, Mara Levy, Saksham Suri, Chuong Huynh, Abhinav Shrivastava
Abstract: Recent advancements in vision-language models (VLMs) offer potential for
robot task planning, but challenges remain due to VLMs’ tendency to generate
incorrect action sequences. To address these limitations, we propose VeriGraph,
a novel framework that integrates VLMs for robotic planning while verifying
action feasibility. VeriGraph employs scene graphs as an intermediate
representation, capturing key objects and spatial relationships to improve plan
verification and refinement. The system generates a scene graph from input
images and uses it to iteratively check and correct action sequences generated
by an LLM-based task planner, ensuring constraints are respected and actions
are executable. Our approach significantly enhances task completion rates
across diverse manipulation scenarios, outperforming baseline methods by 58%
for language-based tasks and 30% for image-based tasks.
Source: http://arxiv.org/abs/2411.10446v1