HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning

Authors: Eugene Valassakis, Guillermo Garcia-Hernando

Abstract: Predicting camera-space hand meshes from single RGB images is crucial for
enabling realistic hand interactions in 3D virtual and augmented worlds.
Previous work typically divided the task into two stages: given a cropped image
of the hand, predict meshes in relative coordinates, followed by lifting these
predictions into camera space in a separate and independent stage, often
resulting in the loss of valuable contextual and scale information. To prevent
the loss of these cues, we propose unifying these two stages into an end-to-end
solution that addresses the 2D-3D correspondence problem. This solution enables
back-propagation from camera space outputs to the rest of the network through a
new differentiable global positioning module. We also introduce an image
rectification step that harmonizes both the training dataset and the input
image as if they were acquired with the same camera, helping to alleviate the
inherent scale-depth ambiguity of the problem. We validate the effectiveness of
our framework in evaluations against several baselines and state-of-the-art
approaches across three public benchmarks.

Source: http://arxiv.org/abs/2407.15844v1