New Multi-User, Multi-Object Dataset for Joint 3D Hand-Object Pose Estimation

Christopher Dossman
Jul 11 · 2 min read

This research summary is just one of many that are distributed weekly on the AI scholar newsletter. To start receiving the weekly newsletter, sign up here.

Pose estimation is an important step towards understanding people in images and videos with numerous applications in action understanding, human-robot interaction, surveillance, motion capture, and more.

When it comes to hand-object pose estimation however, state-of-the-art methods still fail due to large mutual occlusions, and a lack of datasets specific to 3D pose estimation for hand+object interaction. Additionally, even when synthetic images are used for training, annotated real-world images are still needed for model validation.

Joint 3D Hand-Object Pose Estimation Dataset

Researchers recently proposed HO-3D, a large-scale dataset of diverse hand-object interaction with 3D annotations of hand and object pose. They also introduced methods to efficiently annotate and predict based on the dataset.

Example of hand and object segmentation obtained with DeepLabV3. Input image (Left); Object mask (Center); Hand mask (Right).

HO-3D is based on global optimization that exploits depth, color, and temporal constraints for efficiently annotating the sequences, which the researchers used to train the new approach for predicting both the 3D poses of the hand and the object from a single color image. HO-3D dataset is made of RGB-D sequences of 8 different people manipulating different objects, and manual annotations inside views for evaluation of the 3D poses.

Potential Uses and Effects

Knowing that more quality data means model accuracy, HO-3D is important for enabling efficient training for the development of highly robust models. The proposed dataset is an encouragement to researchers to develop better annotation methods that can be applied to capture and easily annotate sequences with single RGB-D camera to facilitate additional training data for improved hand + object pose estimation which will inspire more efficient applications in computer vision and robotics.

Read more: https://arxiv.org/abs/1907.01481

Thanks for reading. Please comment, share and remember to subscribe to our weekly newsletter for the most recent and interesting research papers! You can also follow me on Twitter and LinkedIn. Remember to 👏 if you enjoyed this article. Cheers!

AI³ | Theory, Practice, Business

The AI revolution is here! Navigate the ever changing industry with our thoughtfully written articles whether your a researcher, engineer, or entrepreneur

Christopher Dossman

Written by

Deep Learning Engineer, Teacher, and Entrepreneur

AI³ | Theory, Practice, Business

The AI revolution is here! Navigate the ever changing industry with our thoughtfully written articles whether your a researcher, engineer, or entrepreneur

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade