Structure-aware 3D Hand Pose Regression from a Single Depth Image

Muhammad Jameel Nawaz Malik, Ahmed Elhayek, Didier Stricker

In: Proceedings of EuroVR 2018 |. EuroVR (EuroVR-2018) October 22-23 London United Kingdom Springer 11/2018.


Hand pose tracking in 3D is an essential task for many virtual reality (VR) applications such as games and manipulating virtual objects with bare hands. CNN-based learning methods achieve the state-of-the-art accuracy by directly regressing 3D pose from a single depth image. However, the 3D pose estimated by these methods is coarse and kinematically unstable due to independent learning of sparse joint positions. In this paper, we propose a novel structureaware CNN-based algorithm which learns to automatically segment the hand from a raw depth image and estimate 3D hand pose jointly with new structural constraints. The constraints include fingers lengths, distances of joints along the kinematic chain and fingers inter-distances. Learning these constraints help to maintain a structural relation between the estimated joint keypoints. Also, we convert sparse representation of hand skeleton to dense by performing n-points interpolation between the pairs of parent and child joints. By comprehensive evaluation, we show the effectiveness of our approach and demonstrate competitive performance to the state-of-the-art methods on the public NYU hand pose dataset.


Malik2018_EuroVR_3D_Hand_Pose_Regression.pdf (pdf, 2 MB )

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz