Pose estimation and NVIDIA’s breakthrough

Published in

AIGuys

5 min readJan 1, 2022

Lately, pose estimation has gained a lot of traction because of the advancement made in AI and Computer vision. For our readers let me define what pose estimation is. Pose Estimation is a general problem in Computer Vision where we detect the position and orientation of an object. Generally pose estimation is often associated with the human body where we try to keep the track of different key points of a person moving in a video frame.

The pose estimation model takes a processed camera image as the input and outputs information about key points. The key points detected are indexed by a part ID, with a confidence score between 0.0 and 1.0. The confidence score indicates the probability that a key point exists in that position. Generally, the pose detection model estimates 16 or 18 points. There are two major approaches to pose detection, bottom-up or top-down. In top-down, first, we detect the presence of humans in the image, crop that particular area, and then detect the key points. The disadvantage with this approach is that we have to run key point detectors for each human detected in the image and that makes it slow to be used in real-time. In the bottom-up approach, we directly detect key points and connect the different points to do the full pose estimation. To reduce the number of False Positives, we use the human detector in the joining stage of key point detection.

Pose estimation and NVIDIA’s breakthrough

Written by Vishal Rajput