Using Computer Vision to Rate Yoga Poses

Henry Hollis
4 min readJul 21, 2020

--

My real time yoga-judger watching me attempt Tree pose

Ever since I have started running a couple years ago, I have come to understand the importance of stretching. The problem is, I’m no good at it! When I was introduced to techniques like pose estimation and computer vision in general, I was excited to apply them to this real world problem of yoga form.

Pose Estimation, figure courtesy of Nanonets: A 2019 guide to Human Pose Estimation with Deep Learning

Pose estimation is a sub-field of computer vision that concerns itself with recognizing individual parts that make up a body (usually a human body). While there are several techniques to achieve this, the approach I use starts by running incoming images through a CNN classifier trained to look for humans. If and when a human body is detected, the pose estimator network looks for the joints and appendages it’s trained on. I can then display the image back to the user with markings on where the computer has judged a body’s parts to be.

From the beginning, I knew I wanted my pose analyzer to judge my poses in real time, through my laptop camera. As it turns out, this seriously limited my options for pose estimation models. First I tried the widely used OpenPose from Carnegie Mellon University. As with all computer vision and object detection, there is a trade-off between accuracy and speed. With the larger model size and the puny compute on my personal laptop, this state of the art pose estimator would not serve my purposes. What I finally settled on was a MobileNet for general human detection and a small ResNet model for appendage detection, both of which can be found in Gluon’s Modelzoo. These smaller networks allow my measly computing resources to judge video frames from my webcam in real time (after some finagling). Even by only classifying every other frame of video, the output is not as smooth as I had hoped.

At this point my computer could track the appendages of humans in the frame but could not say anything about their yoga form. To accomplish this, I decided to add another network to the image pipeline. Instead of another cumbersome CNN, I thought I could cleverly use a simple ANN trained on the angles between the appendages of the body in question. My hope was that if I could calculate/record the inter-joint angles of bodies in the webcam for various yoga poses, the poses would be linearly separable (albeit in higher dimensions). If this were true, anytime the computer saw a body making a particular combination of angles, it would know which yoga pose they were attempting. In fact, the confidence in the prediction of the pose could even be used as an indicator for the quality of form (if the training data used was from an experience yogi).

This is where Yoga with Uliana, a popular Youtube yogi, comes in. From her 30 Most Common Beginning Yoga Poses video, I selected 15 easily distinguishable poses, ran the video clips through my pose estimator, recorded the angles of each joint for every frame of each clip. Now I had a labeled dataset with several hundred examples to train my ANN on. I used a small Keras model (only two layers), and was able to correctly predict 93% of the poses in my validation set. The best part is, when I added this predictor to my already bloated pipeline, there was no drop in frame rate! Perfect!

Correctly predicting reverse warrior pose from validation data

The complete code is available on my github, all feedback is not only welcomed but desired.

Next Steps

With the blazing speed at which computer vision is evolving, new pose estimation techniques and models will soon replace the tried and true methods of today. Changing the the networks/replacing them with newer predictors is definitely something I will try going forward.

One limitation of using this software to judge how well a pose is done is that the yoga data was taken from a single source. In the future, to make this project more robust I will train the ANN on several master yogis doing the same pose (ideally both male and female yogis).

Additionally, my ANN is trained on only 15 of Uliana’s yoga poses, somewhat limiting the utility of this project to yoga newbies. One problem is, while I can add new poses to the training data and retrain my ANN, it makes the software pretty inflexible. I’m currently trying to think of ways around this problem, where I wouldn’t have to train a whole new network every time I wanted to add a couple new poses.

Finally, coming full circle back to my love of running, I want to use what I have learned about pose estimation to analyze my running form!

Applications

The Covid-19 quarantine has resulted in the surge of the use of home exercise equipment. One such product is the smart mirror, such as the ones made by MIRROR, which streams exercise classes directly to your mirror! This seems like a perfect application for my project, especially if I were to adapt it to recognize other form-sensitive exercises like push-ups or running.

--

--