Deep Learning based Human Pose Estimation using OpenCV and MediaPipe

Arthur Fortes
Nerd For Tech
Published in
3 min readMay 20, 2021

--

A guide to MediaPipe library for Human Pose Estimation

Result of Pose Estimation without background.

One of the hardest tasks in computer vision is determining the high degree-of-freedom configuration of a human body with all its limbs, complex self-occlusion, self-similar parts, and large variations due to clothing, body-type, lighting, and many other factors. The problem of human pose estimation can be defined as the computer vision techniques that predict the location of various human keypoints(joints and landmarks) such as elbows, knees, neck, shoulder, hips, chest etc. In today’s article, we will learn about deep learning based human pose estimation using MediaPipe and OpenCV libraries.

Table of Contents

  • What is MediaPipe?
  • Pose Estimation Problem
  • Implementing the Solution
  • Useful Links

What is MediaPipe?

Mediapipe is a framework mainly used for building multimodal audio, video, or any time series data. With the help of the MediaPipe framework, an impressive ML pipeline can be built for instance of inference models like TensorFlow, TFLite, and also for media processing functions.

Note: You don’t even need a GPU for running the experiments with MediaPipe, as today’s integrated graphics and CPUs working well for this solution. Logically, the FPS will be much lower than the use of a GPU.

MediaPipe offers customizable Python solutions as a prebuilt Python package on PyPI, which can be installed simply with pip install mediapipe. It also provides tools for users to build their own solutions. Please see MediaPipe in Python for more info.

MediaPipe solutions for ML:

ML solutions covered by MediaPipe. (Source: https://google.github.io/mediapipe/solutions/pose.html)

Pose Estimation Problem

Human pose estimation from video plays a critical role in various applications such as quantifying physical exercises, sign language recognition, and full-body gesture control. For example, it can form the basis for yoga, dance, and fitness applications. It can also enable the overlay of digital content and information on top of the physical world in augmented reality. MediaPipe Pose is a ML solution for high-fidelity body pose tracking, inferring 33 3D landmarks on the whole body from RGB video frames utilizing our BlazePose research that also powers the ML Kit Pose Detection API.

Landmarks recognized by MediaPipe. (Source: https://google.github.io/mediapipe/solutions/pose.html)

Implementing the Solution

Requirements

For this project, I used Python with Anaconda Env and used the followed libs:

# Using conda
conda install -c conda-forge opencv
# Using pip
pip install mediapipe

Creating the Pose Estimator Class

Let us create a python class to estimate the pose and also that the class can be used for any further project related to posing estimation. Also, you can use it in real-time with the help of your webcam.

Making prediction

Since we created a class in the above file, we will make use of it in another file to make predictions in videos.

Now just run the follow command:

python detector.py -i videos/input_video.mp4 -o videos/output_video.mp4 -b False

Note: The complete code and requirements are available on my GitHub, which is available on the useful links.

Result

Useful links

--

--

Arthur Fortes
Nerd For Tech

Data scientist and Python developer with experience in research and industrial projects. Innovation Enthusiast.