Authors: Anoop Cherian, Radu Corcodel, Siddarth Jain, Diego Romeres
Abstract: Physical reasoning is an important skill needed for robotic agents when
operating in the real world. However, solving such reasoning problems often
involves hypothesizing and reflecting over complex multi-body interactions
under the effect of a multitude of physical forces and thus learning all such
interactions poses a significant hurdle for state-of-the-art machine learning
frameworks, including large language models (LLMs). To study this problem, we
propose a new physical reasoning task and a dataset, dubbed TraySim. Our task
involves predicting the dynamics of several objects on a tray that is given an
external impact — the domino effect of the ensued object interactions and
their dynamics thus offering a challenging yet controlled setup, with the goal
of reasoning being to infer the stability of the objects after the impact. To
solve this complex physical reasoning task, we present LLMPhy, a zero-shot
black-box optimization framework that leverages the physics knowledge and
program synthesis abilities of LLMs, and synergizes these abilities with the
world models built into modern physics engines. Specifically, LLMPhy uses an
LLM to generate code to iteratively estimate the physical hyperparameters of
the system (friction, damping, layout, etc.) via an implicit
analysis-by-synthesis approach using a (non-differentiable) simulator in the
loop and uses the inferred parameters to imagine the dynamics of the scene
towards solving the reasoning task. To show the effectiveness of LLMPhy, we
present experiments on our TraySim dataset to predict the steady-state poses of
the objects. Our results show that the combination of the LLM and the physics
engine leads to state-of-the-art zero-shot physical reasoning performance,
while demonstrating superior convergence against standard black-box
optimization methods and better estimation of the physical parameters.
Source: http://arxiv.org/abs/2411.08027v1