LightWeightHumanPose : A Machine Learning Model for Fast Multi-person Pose Estimation

Published in

axinc-ai

4 min readApr 16, 2021

--

This is an introduction to「LightWeightHumanPose」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

LightWeightHumanPose is a pose estimation model released by Intel in November 2018 that detects multiple people simultaneously at high speed. It is optimized for fast inference even on CPUs.

Source：https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch

This detection model can be applied to gesture and action detection and recognition, motion capture, and sports analysis.

Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose

In this work we adapt multi-person pose estimation architecture to use it on edge devices. We follow the bottom-up…

arxiv.org

Architecture

There are two approaches to pose estimation: the top-down approach and the bottom-up approach.

In the top-down approach, pose estimation is performed on each detected person after the person detection is performed by YOLO or other methods. The detection speed depends on the number of people.

In the bottom-up approach, all the key points are detected first, and then the key points are grouped into people. This approach is fast because it performs pose estimation for all people together.

LightWeightHumanPose uses a bottom-up approach, similar to OpenPose. It calculates a heatmap for each keypoint and a Part Affinity Fields (PAF) showing the connections between keypoints from the input image.

The PAF indicates which keypoint of the set of keypoints B (e.g. elbow) is the keypoint of the same person, given keypoint A (e.g. shoulder). To calculate the relevance of the candidate connection keypoint B1 to a certain coordinate A1 of keypoint A, calculate the sum of the PAF values on the line between the coordinates A1 and B1. Calculate this value for all of the keypoints B1 to BN, and adopt the combination with the largest total value.

The original OpenPose uses VGG-19 for the backbone. It also repeats Refinment 5 times. The input resolution is 368x368.

LightWeightHumanPose uses MobileNet v1 as backbone. It performs only one Refinement and replaces 7x7 Convolution with a combination of 1x1, 3x3 and 3x3 Convolutions to have the same receptive field (reference pixel).

This reduces computational complexity from 136.1 GFlops for OpenPose, to 9 GFlops for LightWeightHumanPose, while maintaining an AP of 42.8 versus 48.6.

As a result, it runs at 26 fps on the CPU.
The COCO Dataset was used for training.

Usage

The ailia SDK implements the pre-processing and post-processing in C++, which makes it faster than the usual Python implementation.

$ python3 lightweight-human-pose-estimation.py -v 0

In a RTX2080 + cuDNN environment, inference can be done in 11ms including post-processing.

The default recognition resolution is 320x240, but if you want to recognize smaller people, you can use the -dw and -dh options to increase the recognition resolution.

$ python3 lightweight-human-pose-estimation.py -v 0 -dw 640 -dh 480

By reducing the recognition resolution to 160x120 with the -dw and -dh options, it is also possible to increase the inference speed on the RaspberryPi4 to about 150ms.

$ python3 lightweight-human-pose-estimation.py -v 0 -dw 160 -dh 120

axinc-ai/ailia-models

(Image from…

github.com

Here is the result of LightWeightHumanPose.

LightWeightHumanPose : A Machine Learning Model for Fast Multi-person Pose Estimation

Overview

Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose

In this work we adapt multi-person pose estimation architecture to use it on edge devices. We follow the bottom-up…

Architecture

Usage

axinc-ai/ailia-models

(Image from…

Related topics

BlazePose : A 3D Pose Estimation Model

This is an introduction to「BlazePose」, a machine learning model that can be used with ailia SDK. You can easily use…

PoseResnet : A Top-down Machine Learning Model for Pose Estimation

GAST : A machine learning model that predicts a 3D skeleton from a 2D skeleton

AnimalPose : Pose Esimation for Animals

This is an introduction to「AnimalPose」, a machine learning model that can be used with ailia SDK. You can easily use…

Written by David Cochard