New SOTA on Human Posture Recognition in 3D and Real-Time
The new SOTA neural network recognizes 3D human posture in real-time
Assessing a person’s posture and recognizing an action are related tasks because both problems depend on the representation and analysis of the person’s body. However, most of the existing models solve these problems separately. Researchers propose a multitasking framework that solves the problem of jointly evaluating 2D and 3D poses from images and classifying actions from a video recording.
One architecture handles both tasks at the state-of-the-art level. At the same time, the inference model processes more than 100 frames per second. The proposed neural network uses separate parameters when solving problems of posture assessment and classification of actions.
Approach architecture
The workflow of the model consists of the following steps:
- Feature maps are extracted from the input images;
- Feature maps are input to a sequence of convolutional networks that consist of prediction blocks (PB), upscaling and downscaling modules (UU and DU), and skip connections;
- Each PB block provides predictions for posture and action. These predictions are refined in subsequent blocks.
The model was trained entirely on labeled data.
Model performance evaluation
The researchers tested the model on four datasets: MPII, Human3.6M, Penn Action, and NTU RGB + D. Below you can see that for the Human3.6M dataset, the neural network bypasses the previous approaches in the accuracy of classifying actions by video.