High-performance hand landmark detection in React Native using Vision Camera and Skia frame processor

Lukasz Kurant
7 min readAug 1, 2024

--

Developing a real-time application using the MediaPipe model.

Photo inside render added by Nur Andi Ravsanjani Gusma

The new architecture brings a whole new set of capabilities to the development of Ract Native applications. These advantages are leveraged by the React Native Vision Camera library, which allows us to use custom frame processors that enable us to process frames in real time. Additionally, one of the latest versions introduces the ability to draw on a frame using React Native Skia, which completely changes the way we interact with the camera in React Native applications.

Among the Vision tasks of the MediaPipe library, we can find TensorFlow models enabling the use of images for the detection of selected features. Among these tasks, we can find the detection of Hand Landmarks, i.e. key points on human hands. We can use these points for gesture recognition or to render visual effects.

Hand landmarks list. Source

The MediaPipe library for this task enables detection on single images, but also on image streams for example frames of a video, which make it possible to perform the required calculations in an extremely optimal way. The average point detection time for the Pixel 6 using the CPU is 17 ms and the GPU only 12 ms.

In this article I would like to show you how to create a custom frame processor for the Vision Camera in React Native applications using a native processor written in Swift that uses the MediaPipe library to run Tensorflow models. You can find the repository with the code here: https://github.com/lukaszkurantdev/blog-hand-landmarks

Configuration

  1. Let’s start by creating a project:
npx react-native init handlandmarks

2. Let’s also install the Vision Camera library:

npm i react-native-vision-camera
cd ios && pod install

In the ios/{{project}}/Info.plist file we add an entry:

<key>NSCameraUsageDescription</key>
<string>$(PRODUCT_NAME) needs access to your Camera.</string>

In the App.ts file, we add support for displaying the camera.


function App(): React.JSX.Element {
const device = useCameraDevice('front');
const {hasPermission, requestPermission} = useCameraPermission();

useEffect(() => {
requestPermission();
}, [requestPermission]);

if (!hasPermission) {
return <Text>No permission</Text>;
}
if (device == null) {
return <Text>No device</Text>;
}
return (
<Camera style={StyleSheet.absoluteFill} device={device} isActive={true} />
);
}

Once the camera is up and running, and rights have been assigned to it, we can see this view:

Screenshots of the application after build.

Configuration of frame processor

The first step will be to download the library necessary for its use.

npm install react-native-worklets-core
cd ios && pod install

In the babel.config.js file, we add:

module.exports = {
plugins: [
["react-native-worklets-core/plugin"],
// ...
],
// ...
};

We should also remember to clear the metro bundler cache:

npm run start --reset-cache

Integration with Media Pipe

In the ios/Podfile we add the library:

pod 'MediaPipeTasksVision', '0.10.14'

This allows us to access the use of the models after reinstalling the Pods.

Adding a model

To use MediaPipe’s capabilities, it is necessary to add a task file in Xcode. We can download it from here: https://storage.googleapis.com/mediapipe-models/hand_landmarker/hand_landmarker/float16/latest/hand_landmarker.task

We add it to the root directory of the project:

Xcode main project menu.

Creation of the Frame Processor

For this we will use a ready-made library that will allow us to generate a processor:

npx vision-camera-plugin-builder@latest ios
During creation, I chose a version for the Swift language.

In our TS code we can also add:

const plugin = VisionCameraProxy.initFrameProcessorPlugin('handLandmarks', {});

export function handLandmarks(frame: Frame) {
'worklet';
if (plugin == null) {
throw new Error('Failed to load Frame Processor Plugin!');
}
return plugin.call(frame);
}

In xcode we can check that the necessary Swift file has been generated.

Generated frame processor code.

Implementation

The next step will be to implement the logic in Swift code. We will start with the configuration. The model will search for 2 hands in video mode. The confidence values are the thresholds with which the model works — if you want accuracy you can increase this value.

let options = HandLandmarkerOptions()
options.baseOptions.modelAssetPath = "hand_landmarker.task"
options.runningMode = .video
options.minHandDetectionConfidence = 0.5
options.minHandPresenceConfidence = 0.5
options.minTrackingConfidence = 0.5
options.numHands = 2

The next step is simply to perform detection and parse the values into those required for JSI and readable by the JS site.

do {
let handLandmarker = try HandLandmarker(options: options)
let image = try MPImage(sampleBuffer: buffer)
let result = try handLandmarker.detect(videoFrame: image,
timestampInMilliseconds: Int(frame.timestamp))

var landmarks = [] as Array

for hand in result.landmarks {
var marks = [] as Array

for handmark in hand {
marks.append([
"x": handmark.x,
"y": handmark.y
])
}

landmarks.append(marks)
}

return landmarks
} catch {
return nil
}

We can now check the performance on the JavaScript side. Let’s modify the App.ts file by adding a new frame processor.

const frameProcessor = useFrameProcessor(frame => {
'worklet';
const data = handLandmarks(frame);
console.log(data);
}, []);

//...

return (
<Camera
style={StyleSheet.absoluteFill}
device={device}
isActive={true}
frameProcessor={frameProcessor}
fps={30}
pixelFormat="rgb"
/>
);

By default, Vision Camera returns frames in YUV format, but the Tensorflow that MediaPipe uses requires RGB format. It is therefore necessary to add the property pixelFormat=”rgb” to the Camera component.

As you can see, the value can be read.

Data from the MediaPipe model.

Integration with Skia

The next step will be to mark the necessary lines and points. To do this, we will use React Native Skia. To install it and the dependency as Reanimated, you will need to run the following command:

npm install @shopify/react-native-skia react-native-reanimated

Additionally, let’s add a plugin to the babel.config.js file:

module.exports = {
//...
plugins: [
//...
'react-native-reanimated/plugin',
],
};

And let’s clear the metro bundler cache:

npm run start --reset-cache

Drawing lines and points

To understand what the lines are supposed to look like, let’s take another look at the photo:

The indexes correspond to the points in our resulting array. With this knowledge, we can easily prepare an array of lines.

const lines = [
[0, 1], [1, 2], [2, 3], [3, 4], [0, 5], [5, 6], [6, 7], [7, 8], [5, 9],
[9, 10], [10, 11], [11, 12], [9, 13], [13, 14], [14, 15], [15, 16],
[13, 17], [17, 18], [18, 19], [19, 20], [0, 17],
];

To use a frame processor with Skia’s drawing capability, let’s change the function that creates our processor to useSkiaFrameProcessor. The points come out of the algorithm in normalized form in the interval [0-1]. Thus, when the value of the coordinate of a point is, for example, equal to 0.5, the point should be located in the middle of the frame. Therefore, in order to correctly display the values it is necessary to multiply them by the frame size.

const paint = Skia.Paint();
paint.setStyle(PaintStyle.Fill);
paint.setStrokeWidth(2);
paint.setColor(Skia.Color('red'));

const linePaint = Skia.Paint();
linePaint.setStyle(PaintStyle.Fill);
linePaint.setStrokeWidth(4);
linePaint.setColor(Skia.Color('lime'));

const frameProcessor = useSkiaFrameProcessor(frame => {
'worklet';
const data = handLandmarks(frame);

frame.render();

const frameWidth = frame.width;
const frameHeight = frame.height;

for (const hand of data || []) {
// Draw lines
for (const [from, to] of lines) {
frame.drawLine(
hand[from].x * Number(frameWidth),
hand[from].y * Number(frameHeight),
hand[to].x * Number(frameWidth),
hand[to].y * Number(frameHeight),
linePaint,
);
}

// Draw circles
for (const mark of hand) {
frame.drawCircle(
mark.x * Number(frameWidth),
mark.y * Number(frameHeight),
6,
paint,
);
}
}
}, []);

Final result

Real-time detection.

Summary

We have created a method for detecting hand landmarks and drawing them in real time in our mobile app. Skia’s integration with Frame processors in Vision Camera is a real game changer. It completely changes the way we used the camera from within the JavaScript code in React Native.

You can find the code from the article here: https://github.com/lukaszkurantdev/blog-hand-landmarks

Sources

[1] https://ai.google.dev/edge/mediapipe/solutions/vision/hand_landmarker
[2] https://react-native-vision-camera.com/docs/guides
[3] https://mrousavy.com/blog/VisionCamera-Pose-Detection-TFLite

--

--

Lukasz Kurant

Fullstack Developer interested in solving difficult problems.