High-performance hand landmark detection in React Native using Vision Camera and Skia frame processor
Developing a real-time application using the MediaPipe model.
The new architecture brings a whole new set of capabilities to the development of Ract Native applications. These advantages are leveraged by the React Native Vision Camera library, which allows us to use custom frame processors that enable us to process frames in real time. Additionally, one of the latest versions introduces the ability to draw on a frame using React Native Skia, which completely changes the way we interact with the camera in React Native applications.
Among the Vision tasks of the MediaPipe library, we can find TensorFlow models enabling the use of images for the detection of selected features. Among these tasks, we can find the detection of Hand Landmarks, i.e. key points on human hands. We can use these points for gesture recognition or to render visual effects.
The MediaPipe library for this task enables detection on single images, but also on image streams for example frames of a video, which make it possible to perform the required calculations in an extremely optimal way. The average point detection time for the Pixel 6 using the CPU is 17 ms and the GPU only 12 ms.
In this article I would like to show you how to create a custom frame processor for the Vision Camera in React Native applications using a native processor written in Swift that uses the MediaPipe library to run Tensorflow models. You can find the repository with the code here: https://github.com/lukaszkurantdev/blog-hand-landmarks
Configuration
- Let’s start by creating a project:
npx react-native init handlandmarks
2. Let’s also install the Vision Camera library:
npm i react-native-vision-camera
cd ios && pod install
In the ios/{{project}}/Info.plist file we add an entry:
<key>NSCameraUsageDescription</key>
<string>$(PRODUCT_NAME) needs access to your Camera.</string>
In the App.ts file, we add support for displaying the camera.
function App(): React.JSX.Element {
const device = useCameraDevice('front');
const {hasPermission, requestPermission} = useCameraPermission();
useEffect(() => {
requestPermission();
}, [requestPermission]);
if (!hasPermission) {
return <Text>No permission</Text>;
}
if (device == null) {
return <Text>No device</Text>;
}
return (
<Camera style={StyleSheet.absoluteFill} device={device} isActive={true} />
);
}
Once the camera is up and running, and rights have been assigned to it, we can see this view:
Configuration of frame processor
The first step will be to download the library necessary for its use.
npm install react-native-worklets-core
cd ios && pod install
In the babel.config.js file, we add:
module.exports = {
plugins: [
["react-native-worklets-core/plugin"],
// ...
],
// ...
};
We should also remember to clear the metro bundler cache:
npm run start --reset-cache
Integration with Media Pipe
In the ios/Podfile we add the library:
pod 'MediaPipeTasksVision', '0.10.14'
This allows us to access the use of the models after reinstalling the Pods.
Adding a model
To use MediaPipe’s capabilities, it is necessary to add a task file in Xcode. We can download it from here: https://storage.googleapis.com/mediapipe-models/hand_landmarker/hand_landmarker/float16/latest/hand_landmarker.task
We add it to the root directory of the project:
Creation of the Frame Processor
For this we will use a ready-made library that will allow us to generate a processor:
npx vision-camera-plugin-builder@latest ios
In our TS code we can also add:
const plugin = VisionCameraProxy.initFrameProcessorPlugin('handLandmarks', {});
export function handLandmarks(frame: Frame) {
'worklet';
if (plugin == null) {
throw new Error('Failed to load Frame Processor Plugin!');
}
return plugin.call(frame);
}
In xcode we can check that the necessary Swift file has been generated.
Implementation
The next step will be to implement the logic in Swift code. We will start with the configuration. The model will search for 2 hands in video mode. The confidence values are the thresholds with which the model works — if you want accuracy you can increase this value.
let options = HandLandmarkerOptions()
options.baseOptions.modelAssetPath = "hand_landmarker.task"
options.runningMode = .video
options.minHandDetectionConfidence = 0.5
options.minHandPresenceConfidence = 0.5
options.minTrackingConfidence = 0.5
options.numHands = 2
The next step is simply to perform detection and parse the values into those required for JSI and readable by the JS site.
do {
let handLandmarker = try HandLandmarker(options: options)
let image = try MPImage(sampleBuffer: buffer)
let result = try handLandmarker.detect(videoFrame: image,
timestampInMilliseconds: Int(frame.timestamp))
var landmarks = [] as Array
for hand in result.landmarks {
var marks = [] as Array
for handmark in hand {
marks.append([
"x": handmark.x,
"y": handmark.y
])
}
landmarks.append(marks)
}
return landmarks
} catch {
return nil
}
We can now check the performance on the JavaScript side. Let’s modify the App.ts file by adding a new frame processor.
const frameProcessor = useFrameProcessor(frame => {
'worklet';
const data = handLandmarks(frame);
console.log(data);
}, []);
//...
return (
<Camera
style={StyleSheet.absoluteFill}
device={device}
isActive={true}
frameProcessor={frameProcessor}
fps={30}
pixelFormat="rgb"
/>
);
By default, Vision Camera returns frames in YUV format, but the Tensorflow that MediaPipe uses requires RGB format. It is therefore necessary to add the property pixelFormat=”rgb” to the Camera component.
As you can see, the value can be read.
Integration with Skia
The next step will be to mark the necessary lines and points. To do this, we will use React Native Skia. To install it and the dependency as Reanimated, you will need to run the following command:
npm install @shopify/react-native-skia react-native-reanimated
Additionally, let’s add a plugin to the babel.config.js file:
module.exports = {
//...
plugins: [
//...
'react-native-reanimated/plugin',
],
};
And let’s clear the metro bundler cache:
npm run start --reset-cache
Drawing lines and points
To understand what the lines are supposed to look like, let’s take another look at the photo:
The indexes correspond to the points in our resulting array. With this knowledge, we can easily prepare an array of lines.
const lines = [
[0, 1], [1, 2], [2, 3], [3, 4], [0, 5], [5, 6], [6, 7], [7, 8], [5, 9],
[9, 10], [10, 11], [11, 12], [9, 13], [13, 14], [14, 15], [15, 16],
[13, 17], [17, 18], [18, 19], [19, 20], [0, 17],
];
To use a frame processor with Skia’s drawing capability, let’s change the function that creates our processor to useSkiaFrameProcessor. The points come out of the algorithm in normalized form in the interval [0-1]. Thus, when the value of the coordinate of a point is, for example, equal to 0.5, the point should be located in the middle of the frame. Therefore, in order to correctly display the values it is necessary to multiply them by the frame size.
const paint = Skia.Paint();
paint.setStyle(PaintStyle.Fill);
paint.setStrokeWidth(2);
paint.setColor(Skia.Color('red'));
const linePaint = Skia.Paint();
linePaint.setStyle(PaintStyle.Fill);
linePaint.setStrokeWidth(4);
linePaint.setColor(Skia.Color('lime'));
const frameProcessor = useSkiaFrameProcessor(frame => {
'worklet';
const data = handLandmarks(frame);
frame.render();
const frameWidth = frame.width;
const frameHeight = frame.height;
for (const hand of data || []) {
// Draw lines
for (const [from, to] of lines) {
frame.drawLine(
hand[from].x * Number(frameWidth),
hand[from].y * Number(frameHeight),
hand[to].x * Number(frameWidth),
hand[to].y * Number(frameHeight),
linePaint,
);
}
// Draw circles
for (const mark of hand) {
frame.drawCircle(
mark.x * Number(frameWidth),
mark.y * Number(frameHeight),
6,
paint,
);
}
}
}, []);
Final result
Summary
We have created a method for detecting hand landmarks and drawing them in real time in our mobile app. Skia’s integration with Frame processors in Vision Camera is a real game changer. It completely changes the way we used the camera from within the JavaScript code in React Native.
You can find the code from the article here: https://github.com/lukaszkurantdev/blog-hand-landmarks
Sources
[1] https://ai.google.dev/edge/mediapipe/solutions/vision/hand_landmarker
[2] https://react-native-vision-camera.com/docs/guides
[3] https://mrousavy.com/blog/VisionCamera-Pose-Detection-TFLite
Also take an interest in my other articles: