Physio Pose: A virtual physiotherapy assistant

A Proof-Of-Concept for a real-time personal virtual trainer for physiotherapy exercises.

5 min readMar 4, 2020

Pose estimation on a woman doing jumping jacks

Background
Physiotherapy treatments are usually long-running as it takes time to restore a person’s movement. One has to perform the same set of exercises for a few months every day with the correct posture to regain the movement. Visiting a physiotherapist for every session can be pretty expensive and not everyone can afford that. Those who do the exercise at the comforts of their home have to make sure to get the posture and movement right.

This project is an attempt to create a system using computer vision that can guide, provide instant feedback and act as a personal virtual trainer that would help people do exercises. A system that focuses more on form than on reps.

GitHub repo is at https://github.com/samkit-jain/physio-pose. (It starts off from when I decided to go ahead with openpifpaf)

Usage

The main script to run is physio.py. A sample run command could look like:

python physio.py --exercise seated_right_knee_extension --joints --skeleton --save-output

There are more options available and the full list can be seen by running python physio.py -h.

The result would look something like the GIF below where the top portion would show instructions to the user and the bottom would be the skeleton of the user.

How it works

Let’s start with pose estimation. One resource limitation that I faced was that my laptop does not have an Nvidia GPU which means no CUDA and had to use a pose estimation model that could work on a CPU. For pose estimation, I experimented with the following projects that could work on CPU only machines:

LightTrack: Implementation of “LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking”. GitHub.
Lightweight OpenPose: Implementation of “Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose”. GitHub.
Openpifpaf: Implementation of “PifPaf: Composite Fields for Human Pose Estimation” in PyTorch. GitHub.
Tf-pose-estimation: TensorFlow implementation of OpenPose. GitHub.

Out of these, I found openpifpaf and tf-pose-estimation to have the highest FPS (frames per second) rates. Of the two, openpifpaf gave more accurate results in cases when the full body wasn’t visible or the body was lying sideways.

The next step involves assessing the movement of the joints. We do that by first checking whether all the required joints have been identified and are visible in the frame. Then, the user is instructed to get in the starting position. Next, the user is instructed to perform the subsequent steps involved in the exercise. The feedback is provided based on the angle made by the keypoints being covered.

Let’s take the example of seated knee flexion and extension with the right leg. It is best done sitting in a chair. The sub-steps can be written as

The right leg should be in a seated position, i.e., making an inner angle in the range of 120 to 150 degrees.
Extend the right leg such that the inner angle is 180 degrees.
Bring the right leg back to the starting position.

The program follows the same instructions and guides the user. First checks whether the keypoints are visible. Then waits for the user to move the right leg in the seated position. Once it is, waits for the user to extend and straighten the leg. Once it is, waits for the user to bring the leg back to the starting position. During each of the waiting periods, it provides valuable feedback to the user. For example, if the leg is not in starting position, instructs to do so.

3D

3D data feed provides more real to life impression of a human body and can help in providing much more accurate results.

This experiment was short-lived as creating a 3D image from a 2D image is still not near perfect as can be seen in the GIF to the left. The 2D video which was provided had the person doing a seated bent elbow shoulder rolls. The information is clearly lost in the generated 3D.

Pose similarity

The current application focuses on providing real-time assessment to the exercise that the person is doing. A good feature could be to provide a score on how close the motion was to the perfect motion reference. The solution also needs to consider the variance in the duration of the videos as it is not guaranteed that both the videos would be of the same length and performing the same motion at the same point in time at the same speed.

Code for this is at pose_compare.py.

Run

You need 2 videos. Run pose estimation using physio.py on both the videos and save their CSV results. The execution would be similar to:

$ python physio.py --video video1.mp4 --csv-path video1.csv
$ python physio.py --video video2.mp4 --csv-path video2.csv

To calculate how similar the 2 poses are, run:

$ python pose_compare.py video1.csv video2.csv

This would output a decimal number. The lower the better.

Explanation

The implementation is inspired by the research paper Schneider P., Memmesheimer R., Kramer I., Paulus D. (2019) Gesture Recognition in RGB Videos Using Human Body Keypoints and Dynamic Time Warping. Even though the paper focuses on gesture recognition, we can leverage the processing techniques mentioned in the paper along with dynamic time warping to measure the similarity between sequences which may vary in speed.

It is a 4 step process:

Translation: All the key points are translated such that the nose key point becomes the origin of the coordinate system.
Scaling: The key points are scaled such that the distance between the left shoulder and right shoulder key point becomes 1.
Dimension Selection: Joints that do not move significantly in the sequence are removed.
Dynamic Time Warping: An approximate Dynamic Time Warping algorithm that provides optimal or near-optimal alignments with an O(N) time and memory complexity.

Improvements

There’s a lot of scope for improvement in this implementation and pull requests are welcome.

Stabilisation: As is evident from the demo GIF shared above, the key point identification can be stabilised to give a smooth transition effect.
Multiple reps: Support for doing multiple repetitions of an exercise.
Lightweight model: A model that is light enough to run predictions at 30 fps even on a CPU machine and yet accurate enough to be usable.
Mobile application: A mobile representation of the application.

I would like to extend a special thanks to Mr. Amit Prakash Gupta who gave me the opportunity to work on this project.