Real-time Human Pose Estimator and Tracker

4 min readDec 12, 2021

Example of pose estimation and tracking application on a pre-recorded video

Overview

Human pose estimation can be achieved by using a skeleton-based approach that identifies key points such as knees, elbows, and shoulders, and the connections between them on people in images or videos. I created this side project to experiment with pre-trained computer vision models on a Raspberry Pi with an Intel Neural Compute Stick. The application performs multi-person 2D human pose estimation and person detection on video streams using a human pose estimation model and a person detection model that are deployed using the OpenVINO toolkit.

The pose estimation model identifies up to 18 points: ears, eyes, nose, neck, shoulders, elbows, wrists, hips, knees, and ankles. The results of the pose estimation and person detection predictions are rendered in the application in real-time on a camera feed or pre-recorded video. In addition to the pose estimation, the inference time, number of people in the video, and the order of inference for each person are overlaid on the video.

People entering and exiting the video feed are saved in a database as a series of events that can be accessed via a JSON API. The application can be run on a CPU, GPU, or the Intel Neural Compute Stick 2 via the command line or a Flask-based web application. The web-based interface allows end users to toggle between a real-time camera feed and a pre-recorded demo video. The application can be deployed to a Raspberry Pi with a camera and Intel Neural Compute Stick 2 to speed inference time or run on a regular desktop or laptop computer.

Overview of system with web and command line interfaces

The Code

The code is available at https://github.com/pahurlocker/pose-estimation with instructions on how to get it running locally or on a Raspberry Pi with an Intel Neural Compute Stick. The key libraries in the application include OpenCV, Numpy, OpenVINO Inference Engine, Flask, SqlAlchemy, and Click. The person detection and pose estimation is performed in the PoseEstimator class shown below. The estimation is done on each frame of a video using two different models. OpenCV is used to manipulate the frame and perform the overlays.

Hardware Deployment: Raspberry Pi and Intel Neural Compute Stick 2

I deployed the application to a Raspberry Pi with an Intel Neural Compute Stick 2. The inference performed with some flickering likely due to two models being applied to each frame despite the acceleration offered by the compute stick.

Raspberry Pi with Intel Neural Compute Stick and Camera

The video below is a screen capture of the application running with three people in it. The pose estimation works very well but the person detection struggles at times, especially when they are standing close together. As I mentioned previously, there is a small but noticeable lag in the video.

Deep Learning Models

The human pose detection model is a multi-person 2D pose estimation network that uses a tuned MobileNetV1 as the feature extractor. The model is based on the OpenPose approach and was originally built using the Caffe framework. OpenPose is the first real-time model to jointly detect multiple body parts. The model has two outputs, PAF and a key point heatmap. The coordinates in the heatmap are scaled and used for the pose overlay frame by frame.

The person detection model is developed for pedestrian detection in retail scenarios. It is based on a MobileNetV2-like backbone that includes depth-wise convolutions (single convolution filter per input channel) to reduce computation for the 3x3 convolution block. It was also originally built using the Caffe framework. The model outputs the confidence for the prediction and bounding box coordinates. The application counts any result above a .4 confidence value as a person.

Conclusion

I hope you enjoy playing with this application! It was a fun way to experiment with these models on the Raspberry Pi. The code repository provides instruction on how to run it via the command line or web-based interface https://github.com/pahurlocker/pose-estimation