Real-time pose detection in React Native using MLKit

Lukasz Kurant
dogtronic
Published in
5 min readAug 17, 2022
Human pose detection can be used to check the correctness of exercises.

Human position detection from videos or images has a key role in many modern applications. Determining the correctness of physical exercise, applying filters in augmented reality (filters in many social media applications), but also in sign language recognition or medical applications — in all these situations there is a need for an efficient human position recognition model.

In 2020, researchers Valentin Bazarevsky and Ivan Grishchenko of Google presented to the world the BlazePose tool which has become a permanent part of MLKit from Google, and is used precisely to detect position from a single frame of video, while providing support and real-time processing.

Example of MediaPipe Pose real-world 3D coordinates (source: https://google.github.io/mediapipe/solutions/pose)

Topology

Unlike the current standard in human gun pose processing, the COCO topology, which consists of 17 landmarks, BlazePose has the ability to place as many as 33 points, both on a person’s limbs (using a hand model) and the face itself. We can see the detailed set of points below:

Topology of observation points (source: https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html)

Method of operation

The pose detection is two-component: first the detector locates the so-called region of interest (ROI), in this case it will be a human located in the photo. Next, the landmarks are predicted. To speed up calculations, the first part is performed only on the first frame — for subsequent calculations, points from the previous one are used.

Pose detector (source: https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html)

Example of use

In this article, I would like to present an example of using MLKit for real-time pose detection in a React Native app using the Vision Camera library with the native frame processor for iOS.

Project configuration

The first step will be to create a new React Native application project. The version I am using is React Native 0.68.2. To create a new project we run the command:

npx react-native init posedetection

We also need to install the necessary libraries for the camera and animation:

yarn add react-native-vision-camera react-native-reanimated react-native-svg
npx pod-install

A necessary step for iOS is to add an entry in the Info.plist file:

<key>NSCameraUsageDescription</key>
<string>$(PRODUCT_NAME) needs access to your Camera.</string>

To install a library that enables pose detection, using CocoaPods packages, we add the following entry in the Podfile:

pod 'GoogleMLKit/PoseDetection', '3.1.0'

And then we execute the command:

npx pod-install

Creating a frame processor

In order to enable real-time use of the MLKit library in the Vision Camera library, it is necessary to create a native frame processor. To do this, let’s create a new PoseDetection.h file in the main project directory in Xcode with the header of our class that returns the recognized object.

Next, we need to create a PoseDetection.m file, where our findPose function will be located.

Let’s also create a helper function that returns the coordinates of the selected point (one of the previously described 33 landmarks):

Next, in the findPose function, let’s prepare a frame image and calculate the position of our object:

In case the detection function returns an error and in case no pose is detected, let’s return an empty object of type NSDictionary. If a pose is detected, return the selected coordinates:

The next step will be to create the PoseDetectionFrameProcessor.m file, which will be directly used by the Vision Camera library:

Our frame processor will be named poseDetection and will return an object of type NSDictionary (which will be converted to a regular object on the JavaScript side).

JavaScript-side support

To enable the use of the frame processor, we need to add the following element in the babel.config.js file:

Where __poseDetection is the name of the frame processor, preceded by two “_” characters.

Then, in the App.js file, let’s add a function to enable its use:

To hold the calculated landmark positions, let’s use the useSharedValue hook from the react-native-reanimated library:

Next, we need to calculate the coordinates of the lines between the landmarks:

usePosition is a hook that allows you to create a style used by the reanimated library:

This way, we can later use them to display the lines on the screen. But first, let’s move on to the calculation of the needed landmarks. The code below is to use the native processor to calculate the position of the landmarks, and using the proportions from the user’s screen (the so-called xFactor and yFactor) to record the position of the landmarks on the user’s screen.

In the return function of our App component, we return a <Camera /> component using our frameProcessor:

To draw animated lines using the react-native-reanimated library, let’s use components from react-native-svg:

Results

After all the steps, let’s check how our application works:

The result of the application

Summary

The use of human position detection, opens up extraordinary possibilities in the development of cross-platform mobile applications, and the ability to use off-the-shelf tools like MLKit, is a significant improvement.

The new fabric architecture and libraries like Vision Camera and Reanimated allow the creation of fast communication between native code and JavaScript code, which consequently leads to many new interesting applications and significant optimization of application performance.

You can find the full code on my repository: https://github.com/dogtronic/blog-pose-detection

You can find the Polish version of the article here: https://dogtronic.io/detekcja-pozy-w-czasie-rzeczywistym/

Find us on our Dogtronic website.

--

--

Lukasz Kurant
dogtronic

Fullstack Developer interested in solving difficult problems.