MoveNet : Pose Estimation for Video with Intense Motion

David Cochard
axinc-ai
Published in
3 min readSep 2, 2021

This is an introduction to「MoveNet」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

MoveNet is a pose estimation model released by Google on May 17, 2021. Compared to conventional pose estimation models, it improves the detection accuracy in videos with intense motion. It is ideal for live fitness and sports applications.

Source: https://blog.tensorflow.org/2021/05/next-generation-pose-detection-with-movenet-and-tensorflowjs.html

Architecture

MoveNet is able to detect 17 two-dimensional keypoints with high speed and high accuracy. There are two models available, Lighting and Thunder. The former can be used for applications that require speed and the latter for applications that require accuracy. Both Lightning and Thunder can run at 30FPS or higher on desktop PCs, laptops, and smartphones.

The architecture is similar to CenterNet. The feature extractor is based on MobileNetV2 to which Feature Pyramid Network (FPN) was added. By setting output stride to 4, it can handle high resolution feature map output.

Source: https://blog.tensorflow.org/2021/05/next-generation-pose-detection-with-movenet-and-tensorflowjs.html

The output of the AI model is a person center heatmap, a keypoint regression field, a person keypoint heatmap, and a 2D per-keypoint offset field.

Source: https://blog.tensorflow.org/2021/05/next-generation-pose-detection-with-movenet-and-tensorflowjs.html

The model was trained using the COCO dataset and another Google’s internal dataset called Active. One limitation of the COCO dataset is that it does not include data from harsh environments where poses change drastically or motion blur is present, making it unsuitable for fitness and dance apps. However Google’s internal dataset is made of annotated yoga, fitness, and dance videos from YouTube. Only three frames are taken from each video to ensure diversity in the dataset.

Source: https://blog.tensorflow.org/2021/05/next-generation-pose-detection-with-movenet-and-tensorflowjs.html

Usage

You can use MoveNet with ailia SDK on the video stream of a web camera with the following command.

$ python3 movenet.py -v 0

And here is the result you can expect.

Related topics

ax Inc. has developed ailia SDK, which enables cross-platform, GPU-based rapid inference.

ax Inc. provides a wide range of services from consulting and model creation, to the development of AI-based applications and SDKs. Feel free to contact us for any inquiry.

--

--

David Cochard
axinc-ai

Engineer with 10+ years in game engines & multiplayer backend development. Now focused on machine learning, computer vision, graphics and AR