BlazePose : A 3D Pose Estimation Model
This is an introduction to「BlazePose」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.
BlazePose (Full Body) is a pose detection model developed by Google that can compute (x,y,z) coordinates of 33 skeleton keypoints. It can be used for example in fitness applications.
BlazePose: On-device Real-time Body Pose tracking
We present BlazePose, a lightweight convolutional neural network architecture for human pose estimation that is…
On-device, Real-time Body Pose Tracking with MediaPipe BlazePose
Pose estimation from video plays a critical role enabling the overlay of digital content and information on top of the…
BlazePose input and output
BlazePose consists of two machine learning models: a Detector and an Estimator. The Detector cuts out the human region from the input image, while the Estimator takes a 256x256 resolution image of the detected person as input and outputs the keypoints.
BlazePose outputs the 33 keypoints according the following ordering convention. This is more points than the commonly used 17 keypoints of the COCO dataset.
The Detector is an Single-Shot Detector(SSD) based architecture. Given an input image (1,224,224,3), it outputs a bounding box (1,2254,12) and a confidence score (1,2254,1). The 12 elements of the bounding box are of the form (x,y,w,h,kp1x,kp1y,…,kp4x,kp4y), where kp1x to kp4y are additional keypoints. Each one of the 2254 elements has its own anchor, anchor scale and offset need to be applied.
There are two ways to use the Detector. In box mode, the bounding box is determined from its position (x,y) and size (w,h). In alignment mode, the scale and angle are determined from (kp1x,kp1y) and (kp2x,kp2y), and bounding box including rotation can be predicted.
The Estimator uses heatmap for training, but computes keypoints directly without using heatmap for faster inference.
The first output of the Estimator is (1,195) landmarks , the second output is (1,1) flags. The landmarks are made of 165 elements for the (x,y,z,visibility,presence) for every 33 keypoints .
The z-values are based on the person’s hips, with keypoints being between the hips and the camera when the value is negative, and behind the hips when the value is positive.
The visibility and presence are stored in the range of [min_float,max_float] and are converted to probability by applying a sigmoid function. The visibility returns the probablity of keypoints that exist in the frame and are not occluded by other objects. presence returns the probablity of keypoints that exist in the frame.
Use the following command to run BlazePose (Full Body) with ailia SDK.
$ python3 blazepose-fullbody.py -v 0
ailia-models/pose_estimation_3d/blazepose-fullbody at master · axinc-ai/ailia-models
Here is a result on a sample video. The size of the circles at keypoints indicates the z-value.
The BlazePose (Upper Body) can also be used to estimate only the upper body. Initially, MediaPipe released only the upper body model, and later the full body model . The specifications of the full body and upper body models are different, for example, the detector resolution is 128x128 for the upper body model.
$ python3 blazepose.py -v 0
LightWeightHumanPose : A Machine Learning Model for Fast Multi-person Pose Estimation.
This is an introduction to「LightWeightHumanPose」, a machine learning model that can be used with ailia SDK. You can…