Human Pose Detection
From Gene Kelly’s Step-Dance to Bruce Lee’s Kung-Fu — iconic movement has made history. Communicating through Body Language is an ancient art form, currently evolving in fascinating ways: Computationally detecting human body language is becoming effective and accessible. This experiment explores enabling technologies, applications & implications.
For over 20 years, Motion Capture has enabled us to record actions of humans and then use that information to animate a digital character or analyse poses. While movie makers and game developers embraced such technologies — it until recently required expensive equipment which captured only few aspects of the overall performance.
Today, a new generation of machine learning based systems is making it possible to detect human body language directly from images. A growing number of research papers and open-source libraries addresses key aspects: Body, Hand, Face, Gaze Tracking. Identity, Gender, Age, Emotion and Muscle strain Detection. Action Classification & Prediction. We now can...
Imagine a world where every camera is a realtime body language detector — and every video can be analysed.
Experiment: Human Pose Detection in Videos
Cinema and online video sites are a vast source of recorded human performances. Any imaginable movement has been discovered and perfected: walks, dances, gestures, drama, love and fight scenes. As the new generation of body tracking tools enables us to “mine” body language data from any video, we can now easily “steal” motion from famous movies and then use that data to drive characters in AR/VR — to name just one example.
The following video is made using the OpenPose library to detect human body poses in movie scenes and video clips.
The video tests OpenPose on diverse sources, including sport games, James Brown’s dance routines and Kung-Fu scenes. The Library detected a wide range of footage robustly — failing infrequently in delightfully comedic ways.
OpenPose
All experiment videos were processes with OpenPose - a open-source library for real-time multi-person keypoint detection — authored by Gines Hidalgo, Zhe Cao, Tomas Simon, Shih-En Wei, Hanbyul Joo and Yaser Sheikh. It enables the detection of 18 body keypoints from images and is invariant to the number of detected people. Even though the library is in rapid development, it works reliably out of the box and is fun to use.
OpenPose uses a interesting pipeline to achieve it’s robust performance. The paper “Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields” gives a overview of the inner workings of the System. Finally, this:
“Hands & Face Estimation — Coming Soon!”
Body Language?
OpenPose does not model the entire spectrum of human body language. Today’s systems are still struggling with hard challenges and are limited in scope, yet development is moving very fast. Combined with components such as Face, Gender and age classification, Gaze Estimation, Person Identification, motion prediction and emotion detection, we are gradually arriving at a computational perspective of human body language.
Applications
The list of possible applications is long and growing. Here is a summery of fields, where human body language detection might find heavy use:
Pantomim, Butoh and Gnawa Performances — as seen by OpenPose:
Final Thoughts
Get in touch here: twitter.com/samim |http://samim.io