BlazePose : A 3D Pose Estimation Model

Published in

axinc-ai

4 min readJun 30, 2021

--

This is an introduction to「BlazePose」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

BlazePose (Full Body) is a pose detection model developed by Google that can compute (x,y,z) coordinates of 33 skeleton keypoints. It can be used for example in fitness applications.

Source: https://pixabay.com/ja/photos/%E5%A5%B3%E3%81%AE%E5%AD%90-%E7%BE%8E%E3%81%97%E3%81%84-%E8%8B%A5%E3%81%84-%E3%83%9B%E3%83%AF%E3%82%A4%E3%83%88-5204299/

BlazePose: On-device Real-time Body Pose tracking

We present BlazePose, a lightweight convolutional neural network architecture for human pose estimation that is…

arxiv.org

On-device, Real-time Body Pose Tracking with MediaPipe BlazePose

Pose estimation from video plays a critical role enabling the overlay of digital content and information on top of the…

ai.googleblog.com

BlazePose input and output

BlazePose consists of two machine learning models: a Detector and an Estimator. The Detector cuts out the human region from the input image, while the Estimator takes a 256x256 resolution image of the detected person as input and outputs the keypoints.

BlazePose outputs the 33 keypoints according the following ordering convention. This is more points than the commonly used 17 keypoints of the COCO dataset.

BlazePose keypoints (Source: https://developers.google.com/ml-kit/vision/pose-detection)

Architecture

The Detector is an Single-Shot Detector(SSD) based architecture. Given an input image (1,224,224,3), it outputs a bounding box (1,2254,12) and a confidence score (1,2254,1). The 12 elements of the bounding box are of the form (x,y,w,h,kp1x,kp1y,…,kp4x,kp4y), where kp1x to kp4y are additional keypoints. Each one of the 2254 elements has its own anchor, anchor scale and offset need to be applied.

There are two ways to use the Detector. In box mode, the bounding box is determined from its position (x,y) and size (w,h). In alignment mode, the scale and angle are determined from (kp1x,kp1y) and (kp2x,kp2y), and bounding box including rotation can be predicted.

Source: https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html

The Estimator uses heatmap for training, but computes keypoints directly without using heatmap for faster inference.

Tracking network architecture: regression with heatmap supervision (Source: https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html)

The first output of the Estimator is (1,195) landmarks , the second output is (1,1) flags. The landmarks are made of 165 elements for the (x,y,z,visibility,presence) for every 33 keypoints .

The z-values are based on the person’s hips, with keypoints being between the hips and the camera when the value is negative, and behind the hips when the value is positive.

The visibility and presence are stored in the range of [min_float,max_float] and are converted to probability by applying a sigmoid function. The visibility returns the probablity of keypoints that exist in the frame and are not occluded by other objects. presence returns the probablity of keypoints that exist in the frame.

Model Card BlazePose GHUM 3D.pdf

Edit description

drive.google.com

Usage

Use the following command to run BlazePose (Full Body) with ailia SDK.

$ python3 blazepose-fullbody.py -v 0

ailia-models/pose_estimation_3d/blazepose-fullbody at master · axinc-ai/ailia-models

(Image from…

github.com

Here is a result on a sample video. The size of the circles at keypoints indicates the z-value.

The BlazePose (Upper Body) can also be used to estimate only the upper body. Initially, MediaPipe released only the upper body model, and later the full body model . The specifications of the full body and upper body models are different, for example, the detector resolution is 128x128 for the upper body model.

$ python3 blazepose.py -v 0

axinc-ai/ailia-models

(Image from…

github.com

BlazePose : A 3D Pose Estimation Model

Overview

BlazePose: On-device Real-time Body Pose tracking

We present BlazePose, a lightweight convolutional neural network architecture for human pose estimation that is…

On-device, Real-time Body Pose Tracking with MediaPipe BlazePose

Pose estimation from video plays a critical role enabling the overlay of digital content and information on top of the…

BlazePose input and output

Architecture

Model Card BlazePose GHUM 3D.pdf

Edit description

Usage

ailia-models/pose_estimation_3d/blazepose-fullbody at master · axinc-ai/ailia-models

(Image from…

axinc-ai/ailia-models

(Image from…

Related topics

LightWeightHumanPose : A Machine Learning Model for Fast Multi-person Pose Estimation.

This is an introduction to「LightWeightHumanPose」, a machine learning model that can be used with ailia SDK. You can…

PoseResnet : A Top-down Machine Learning Model for Pose Estimation

AnimalPose : Pose Esimation for Animals

This is an introduction to「AnimalPose」, a machine learning model that can be used with ailia SDK. You can easily use…

Written by David Cochard