Authors: Jin Wang, Arturo Laurenzi, Nikos Tsagarakis
Abstract: Enabling humanoid robots to perform autonomously loco-manipulation in
unstructured environments is crucial and highly challenging for achieving
embodied intelligence. This involves robots being able to plan their actions
and behaviors in long-horizon tasks while using multi-modality to perceive
deviations between task execution and high-level planning. Recently, large
language models (LLMs) have demonstrated powerful planning and reasoning
capabilities for comprehension and processing of semantic information through
robot control tasks, as well as the usability of analytical judgment and
decision-making for multi-modal inputs. To leverage the power of LLMs towards
humanoid loco-manipulation, we propose a novel language-model based framework
that enables robots to autonomously plan behaviors and low-level execution
under given textual instructions, while observing and correcting failures that
may occur during task execution. To systematically evaluate this framework in
grounding LLMs, we created the robot ‘action’ and ‘sensing’ behavior library
for task planning, and conducted mobile manipulation tasks and experiments in
both simulated and real environments using the CENTAURO robot, and verified the
effectiveness and application of this approach in robotic tasks with autonomous
behavioral planning.
Source: http://arxiv.org/abs/2408.08282v1