Human Pose Detection

Mining Body Language from Videos

5 min readMay 22, 2017

From Gene Kelly’s Step-Dance to Bruce Lee’s Kung-Fu — iconic movement has made history. Communicating through Body Language is an ancient art form, currently evolving in fascinating ways: Computationally detecting human body language is becoming effective and accessible. This experiment explores enabling technologies, applications & implications.

Human Pose Estimation, using OpenPose. Footage by Boston Dynamics.

For over 20 years, Motion Capture has enabled us to record actions of humans and then use that information to animate a digital character or analyse poses. While movie makers and game developers embraced such technologies — it until recently required expensive equipment which captured only few aspects of the overall performance.

Human Pose Estimation. Image by OpenPose

Today, a new generation of machine learning based systems is making it possible to detect human body language directly from images. A growing number of research papers and open-source libraries addresses key aspects: Body, Hand, Face, Gaze Tracking. Identity, Gender, Age, Emotion and Muscle strain Detection. Action Classification & Prediction. We now can...

Imagine a world where every camera is a realtime body language detector — and every video can be analysed.

Ministry of Silly Walks — John Cleese — Monty Python — processed with OpenPose.

Experiment: Human Pose Detection in Videos

Cinema and online video sites are a vast source of recorded human performances. Any imaginable movement has been discovered and perfected: walks, dances, gestures, drama, love and fight scenes. As the new generation of body tracking tools enables us to “mine” body language data from any video, we can now easily “steal” motion from famous movies and then use that data to drive characters in AR/VR — to name just one example.

The following video is made using the OpenPose library to detect human body poses in movie scenes and video clips.

The video tests OpenPose on diverse sources, including sport games, James Brown’s dance routines and Kung-Fu scenes. The Library detected a wide range of footage robustly — failing infrequently in delightfully comedic ways.

Cloning Yoga & Tai-Chi Class Videos is exceptionally easy todo.

OpenPose

All experiment videos were processes with OpenPose - a open-source library for real-time multi-person keypoint detection — authored by Gines Hidalgo, Zhe Cao, Tomas Simon, Shih-En Wei, Hanbyul Joo and Yaser Sheikh. It enables the detection of 18 body keypoints from images and is invariant to the number of detected people. Even though the library is in rapid development, it works reliably out of the box and is fun to use.

OpenPose uses a interesting pipeline to achieve it’s robust performance. The paper “Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields” gives a overview of the inner workings of the System. Finally, this:

“Hands & Face Estimation — Coming Soon!”

Hands & Face Estimation Image — by OpenPose

Body Language?

OpenPose does not model the entire spectrum of human body language. Today’s systems are still struggling with hard challenges and are limited in scope, yet development is moving very fast. Combined with components such as Face, Gender and age classification, Gaze Estimation, Person Identification, motion prediction and emotion detection, we are gradually arriving at a computational perspective of human body language.