LightWeightHumanPose : A Machine Learning Model for Fast Multi-person Pose Estimation
This is an introduction to「LightWeightHumanPose」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.
Overview
LightWeightHumanPose is a pose estimation model released by Intel in November 2018 that detects multiple people simultaneously at high speed. It is optimized for fast inference even on CPUs.
This detection model can be applied to gesture and action detection and recognition, motion capture, and sports analysis.
Architecture
There are two approaches to pose estimation: the top-down approach and the bottom-up approach.
In the top-down approach, pose estimation is performed on each detected person after the person detection is performed by YOLO or other methods. The detection speed depends on the number of people.
In the bottom-up approach, all the key points are detected first, and then the key points are grouped into people. This approach is fast because it performs pose estimation for all people together.
LightWeightHumanPose uses a bottom-up approach, similar to OpenPose. It calculates a heatmap for each keypoint and a Part Affinity Fields (PAF) showing the connections between keypoints from the input image.
The PAF indicates which keypoint of the set of keypoints B (e.g. elbow) is the keypoint of the same person, given keypoint A (e.g. shoulder). To calculate the relevance of the candidate connection keypoint B1 to a certain coordinate A1 of keypoint A, calculate the sum of the PAF values on the line between the coordinates A1 and B1. Calculate this value for all of the keypoints B1 to BN, and adopt the combination with the largest total value.
The original OpenPose uses VGG-19 for the backbone. It also repeats Refinment 5 times. The input resolution is 368x368.
LightWeightHumanPose uses MobileNet v1 as backbone. It performs only one Refinement and replaces 7x7 Convolution with a combination of 1x1, 3x3 and 3x3 Convolutions to have the same receptive field (reference pixel).
This reduces computational complexity from 136.1 GFlops for OpenPose, to 9 GFlops for LightWeightHumanPose, while maintaining an AP of 42.8 versus 48.6.
As a result, it runs at 26 fps on the CPU.
The COCO Dataset was used for training.
Usage
The ailia SDK implements the pre-processing and post-processing in C++, which makes it faster than the usual Python implementation.
$ python3 lightweight-human-pose-estimation.py -v 0
In a RTX2080 + cuDNN environment, inference can be done in 11ms including post-processing.
The default recognition resolution is 320x240, but if you want to recognize smaller people, you can use the -dw and -dh options to increase the recognition resolution.
$ python3 lightweight-human-pose-estimation.py -v 0 -dw 640 -dh 480
By reducing the recognition resolution to 160x120 with the -dw and -dh options, it is also possible to increase the inference speed on the RaspberryPi4 to about 150ms.
$ python3 lightweight-human-pose-estimation.py -v 0 -dw 160 -dh 120
Here is the result of LightWeightHumanPose.
Related topics
ax Inc. has developed ailia SDK, which enables cross-platform, GPU-based rapid inference.
ax Inc. provides a wide range of services from consulting and model creation, to the development of AI-based applications and SDKs. Feel free to contact us for any inquiry.