Authors: Brent Yi, Vickie Ye, Maya Zheng, Lea Müller, Georgios Pavlakos, Yi Ma, Jitendra Malik, Angjoo Kanazawa
Abstract: We present EgoAllo, a system for human motion estimation from a head-mounted
device. Using only egocentric SLAM poses and images, EgoAllo guides sampling
from a conditional diffusion model to estimate 3D body pose, height, and hand
parameters that capture the wearer’s actions in the allocentric coordinate
frame of the scene. To achieve this, our key insight is in representation: we
propose spatial and temporal invariance criteria for improving model
performance, from which we derive a head motion conditioning parameterization
that improves estimation by up to 18%. We also show how the bodies estimated by
our system can improve the hands: the resulting kinematic and temporal
constraints result in over 40% lower hand estimation errors compared to noisy
monocular estimates. Project page: https://egoallo.github.io/
Source: http://arxiv.org/abs/2410.03665v1