High-performance hand landmark detection in React Native using Vision Camera and Skia frame processor

7 min readAug 1, 2024

Developing a real-time application using the MediaPipe model.

Photo inside render added by Nur Andi Ravsanjani Gusma

The new architecture brings a whole new set of capabilities to the development of Ract Native applications. These advantages are leveraged by the React Native Vision Camera library, which allows us to use custom frame processors that enable us to process frames in real time. Additionally, one of the latest versions introduces the ability to draw on a frame using React Native Skia, which completely changes the way we interact with the camera in React Native applications.

Among the Vision tasks of the MediaPipe library, we can find TensorFlow models enabling the use of images for the detection of selected features. Among these tasks, we can find the detection of Hand Landmarks, i.e. key points on human hands. We can use these points for gesture recognition or to render visual effects.

The MediaPipe library for this task enables detection on single images, but also on image streams for example frames of a video, which make it possible to perform the required calculations in an extremely optimal way. The average point detection time for the Pixel 6 using the CPU is 17 ms and the GPU only 12 ms.

In this article I would like to show you how to create a custom frame processor for the Vision Camera in React Native applications using a native processor written in Swift that uses the MediaPipe library to run Tensorflow models. You can find the repository with the code here: https://github.com/lukaszkurantdev/blog-hand-landmarks

Configuration

Let’s start by creating a project:

npx react-native init handlandmarks

2. Let’s also install the Vision Camera library:

npm i react-native-vision-camera
cd ios && pod install

In the ios/{{project}}/Info.plist file we add an entry:

<key>NSCameraUsageDescription</key>
<string>$(PRODUCT_NAME) needs access to your Camera.</string>

In the App.ts file, we add support for displaying the camera.


function App(): React.JSX.Element {
  const device = useCameraDevice('front');
  const {hasPermission, requestPermission} = useCameraPermission();

  useEffect(() => {
    requestPermission();
  }, [requestPermission]);

  if (!hasPermission) {
    return <Text>No permission</Text>;
  }
  if (device == null) {
    return <Text>No device</Text>;
  }
  return (
    <Camera style={StyleSheet.absoluteFill} device={device} isActive={true} />
  );
}

Once the camera is up and running, and rights have been assigned to it, we can see this view:

Screenshots of the application after build.

Configuration of frame processor

The first step will be to download the library necessary for its use.

npm install react-native-worklets-core
cd ios && pod install

In the babel.config.js file, we add:

module.exports = {
  plugins: [
    ["react-native-worklets-core/plugin"],
    // ...
  ],
  // ...
};

We should also remember to clear the metro bundler cache:

npm run start --reset-cache

Integration with Media Pipe

In the ios/Podfile we add the library:

pod 'MediaPipeTasksVision', '0.10.14'

This allows us to access the use of the models after reinstalling the Pods.

Adding a model

To use MediaPipe’s capabilities, it is necessary to add a task file in Xcode. We can download it from here: https://storage.googleapis.com/mediapipe-models/hand_landmarker/hand_landmarker/float16/latest/hand_landmarker.task

We add it to the root directory of the project:

Creation of the Frame Processor

For this we will use a ready-made library that will allow us to generate a processor:

npx vision-camera-plugin-builder@latest ios

During creation, I chose a version for the Swift language.

In our TS code we can also add:

const plugin = VisionCameraProxy.initFrameProcessorPlugin('handLandmarks', {});

export function handLandmarks(frame: Frame) {
  'worklet';
  if (plugin == null) {
    throw new Error('Failed to load Frame Processor Plugin!');
  }
  return plugin.call(frame);
}

In xcode we can check that the necessary Swift file has been generated.

Implementation

The next step will be to implement the logic in Swift code. We will start with the configuration. The model will search for 2 hands in video mode. The confidence values are the thresholds with which the model works — if you want accuracy you can increase this value.

let options = HandLandmarkerOptions()
options.baseOptions.modelAssetPath = "hand_landmarker.task"
options.runningMode = .video
options.minHandDetectionConfidence = 0.5
options.minHandPresenceConfidence = 0.5
options.minTrackingConfidence = 0.5
options.numHands = 2

The next step is simply to perform detection and parse the values into those required for JSI and readable by the JS site.

do {
  let handLandmarker = try HandLandmarker(options: options)
  let image = try MPImage(sampleBuffer: buffer)
  let result = try handLandmarker.detect(videoFrame: image, 
    timestampInMilliseconds: Int(frame.timestamp))
  
  var landmarks = [] as Array
  
  for hand in result.landmarks {
    var marks = [] as Array
    
    for handmark in hand {
      marks.append([
        "x": handmark.x,
        "y": handmark.y
      ])
    }
    
    landmarks.append(marks)
  }
  
  return landmarks
} catch {
  return nil
}

We can now check the performance on the JavaScript side. Let’s modify the App.ts file by adding a new frame processor.

const frameProcessor = useFrameProcessor(frame => {
  'worklet';
  const data = handLandmarks(frame);
  console.log(data);
}, []);

//...

return (
  <Camera
    style={StyleSheet.absoluteFill}
    device={device}
    isActive={true}
    frameProcessor={frameProcessor}
    fps={30}
    pixelFormat="rgb"
  />
);

By default, Vision Camera returns frames in YUV format, but the Tensorflow that MediaPipe uses requires RGB format. It is therefore necessary to add the property pixelFormat=”rgb” to the Camera component.

As you can see, the value can be read.

Integration with Skia

The next step will be to mark the necessary lines and points. To do this, we will use React Native Skia. To install it and the dependency as Reanimated, you will need to run the following command:

npm install @shopify/react-native-skia react-native-reanimated

Additionally, let’s add a plugin to the babel.config.js file:

module.exports = {
 //...
 plugins: [
   //...
   'react-native-reanimated/plugin',
 ],
};

And let’s clear the metro bundler cache:

npm run start --reset-cache

Drawing lines and points

To understand what the lines are supposed to look like, let’s take another look at the photo:

The indexes correspond to the points in our resulting array. With this knowledge, we can easily prepare an array of lines.

const lines = [
  [0, 1], [1, 2], [2, 3], [3, 4], [0, 5], [5, 6], [6, 7], [7, 8], [5, 9],
  [9, 10], [10, 11], [11, 12], [9, 13], [13, 14], [14, 15], [15, 16],
  [13, 17], [17, 18], [18, 19], [19, 20], [0, 17],
];

To use a frame processor with Skia’s drawing capability, let’s change the function that creates our processor to useSkiaFrameProcessor. The points come out of the algorithm in normalized form in the interval [0-1]. Thus, when the value of the coordinate of a point is, for example, equal to 0.5, the point should be located in the middle of the frame. Therefore, in order to correctly display the values it is necessary to multiply them by the frame size.

const paint = Skia.Paint();
paint.setStyle(PaintStyle.Fill);
paint.setStrokeWidth(2);
paint.setColor(Skia.Color('red'));

const linePaint = Skia.Paint();
linePaint.setStyle(PaintStyle.Fill);
linePaint.setStrokeWidth(4);
linePaint.setColor(Skia.Color('lime'));

const frameProcessor = useSkiaFrameProcessor(frame => {
  'worklet';
  const data = handLandmarks(frame);

  frame.render();

  const frameWidth = frame.width;
  const frameHeight = frame.height;

  for (const hand of data || []) {
    // Draw lines
    for (const [from, to] of lines) {
      frame.drawLine(
        hand[from].x * Number(frameWidth),
        hand[from].y * Number(frameHeight),
        hand[to].x * Number(frameWidth),
        hand[to].y * Number(frameHeight),
        linePaint,
      );
    }

    // Draw circles
    for (const mark of hand) {
      frame.drawCircle(
        mark.x * Number(frameWidth),
        mark.y * Number(frameHeight),
        6,
        paint,
      );
    }
  }
}, []);

Final result

Summary

We have created a method for detecting hand landmarks and drawing them in real time in our mobile app. Skia’s integration with Frame processors in Vision Camera is a real game changer. It completely changes the way we used the camera from within the JavaScript code in React Native.

You can find the code from the article here: https://github.com/lukaszkurantdev/blog-hand-landmarks

Sources

[1] https://ai.google.dev/edge/mediapipe/solutions/vision/hand_landmarker
[2] https://react-native-vision-camera.com/docs/guides
[3] https://mrousavy.com/blog/VisionCamera-Pose-Detection-TFLite

Also take an interest in my other articles:

Creating a high-performance React Native Vision Camera C++ frame processor using JSI

Setting up and using functions in C++ to handle the frame processor in the React Native app for Android and iOS…

medium.com

Using biometrics in React Native

Use of the react-native-biometrics library for local and remote user authentication.

medium.com

React Native Quick Sqlite with TypeORM integration and performance comparison

Use of the JSI solution and comparison to the old architecture using bridges.

medium.com

High-performance hand landmark detection in React Native using Vision Camera and Skia frame processor

Configuration

Configuration of frame processor

Integration with Media Pipe

Adding a model

Creation of the Frame Processor

Implementation

Integration with Skia

Drawing lines and points

Final result

Summary

Sources

Creating a high-performance React Native Vision Camera C++ frame processor using JSI

Setting up and using functions in C++ to handle the frame processor in the React Native app for Android and iOS…

Using biometrics in React Native

Use of the react-native-biometrics library for local and remote user authentication.

React Native Quick Sqlite with TypeORM integration and performance comparison

Use of the JSI solution and comparison to the old architecture using bridges.

Written by Lukasz Kurant