Navigation World Models

Authors: Amir Bar, Gaoyue Zhou, Danny Tran, Trevor Darrell, Yann LeCun

Abstract: Navigation is a fundamental skill of agents with visual-motor capabilities.
We introduce a Navigation World Model (NWM), a controllable video generation
model that predicts future visual observations based on past observations and
navigation actions. To capture complex environment dynamics, NWM employs a
Conditional Diffusion Transformer (CDiT), trained on a diverse collection of
egocentric videos of both human and robotic agents, and scaled up to 1 billion
parameters. In familiar environments, NWM can plan navigation trajectories by
simulating them and evaluating whether they achieve the desired goal. Unlike
supervised navigation policies with fixed behavior, NWM can dynamically
incorporate constraints during planning. Experiments demonstrate its
effectiveness in planning trajectories from scratch or by ranking trajectories
sampled from an external policy. Furthermore, NWM leverages its learned visual
priors to imagine trajectories in unfamiliar environments from a single input
image, making it a flexible and powerful tool for next-generation navigation
systems.

Source: http://arxiv.org/abs/2412.03572v1

About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like these