Hand-Object Interaction Pretraining from Videos

Authors: Himanshu Gaurav Singh, Antonio Loquercio, Carmelo Sferrazza, Jane Wu, Haozhi Qi, Pieter Abbeel, Jitendra Malik

Abstract: We present an approach to learn general robot manipulation priors from 3D
hand-object interaction trajectories. We build a framework to use in-the-wild
videos to generate sensorimotor robot trajectories. We do so by lifting both
the human hand and the manipulated object in a shared 3D space and retargeting
human motions to robot actions. Generative modeling on this data gives us a
task-agnostic base policy. This policy captures a general yet flexible
manipulation prior. We empirically demonstrate that finetuning this policy,
with both reinforcement learning (RL) and behavior cloning (BC), enables
sample-efficient adaptation to downstream tasks and simultaneously improves
robustness and generalizability compared to prior approaches. Qualitative
experiments are available at: \url{https://hgaurav2k.github.io/hop/}.

Source: http://arxiv.org/abs/2409.08273v1