Deploy any machine learning model for real-time frame processing with React Native Vision Camera and ONNX Runtime.

Published in

Technoid Community

9 min readJan 6, 2024

Please note than this article is based on React Native Vision camera V2

Have you ever came up with this idea, to process frames in real time with the help of a ML model using react native camera plugin. For the past couple of months I have been working a such a project which involved this functionality. So today I am going to explain to you guys how you can Deploy any machine learning model for real-time frame processing with React Native Vision Camera, powered by ONNX Runtime. So at the end of the you can re use this logic to create amazing projects which shines your resume.

Before getting into it let’s talk about ONNX runtime.

ONNX runtime

The Open Neural Network Exchange, or ONNX, is a freely available standard for modeling machine learning models. Microsoft and other leading industry players developed ONNX with the goal of solving the model interoperability issue. By enabling users to transform models trained within a single framework into a format compatible with several frameworks, it facilitates the transition between various deep learning libraries and tools. As a bridge across different deep learning frameworks, ONNX lowers the possibility of vendor lock-in and fosters innovation by facilitating easy model sharing among community members. So as you can see ONNX is a standard bridge across different deep learning frameworks. I am not deep diving into ONNX runtime. You can learn it by your self.

Why use ONNX with this particular example?

It does not has be ONNX, you can use other approach as well. But since ONNX is kind of standard we can use any ML model to achieve your functionalities based on your requirements.

React Native vision camera

(https://github.com/mrousavy/react-native-vision-camera)

VisionCamera is a powerful, high-performance Camera library for React Native

Frame Processor Plugin.

Frame Processor Plugins are native functions (written in Objective-C, Swift, C++, Java or Kotlin) that are injected into the VisionCamera JS-Runtime. They can be synchronously called from your JS Frame Processors (using JSI) without ever going over the bridge. Because VisionCamera provides an easy-to-use plugin API, you can easily create a Frame Processor Plugins.

How this works

Frames that being captured from the RN vision camera can be sent to frame processor module. The Frame Processor gets called with a “Frame” object, which is a JSI HostObject. It holds a reference to the native (C++) Frame Image Buffer (~10 MB in size) and exposes properties such as “width”, “height”, “bytesPerRow” and more to JavaScript so you can synchronously access them. The “Frame” object can be passed around in JS, as well as returned from and passed to a native Frame Processor Plugin.

Okay enough with the fancy words and descriptions. let’s get into it. In this example let’s create an app which identifies different kind of objects.

1. Convert you model into ONNX

This step is depends on the model that you are trying to convert. I will add some reference for you guys, so it will be helpful when you are trying to convert your models into ONNX.

Export a PyTorch model to ONNX

Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX

2. Create the frame processor plugin natively for Android(Java)

A. Open your Project in Android Studio. Copy your model(ONNX in this example) to “raw directory” inside the “res”

B. Create a Java source file, for the Face Detector Plugin this will be called ObjectDetectPluginPackage.java

Since this is not a fully native application, you have to do access resource through reactContext.getResources() and pass it into your frame processor.

import com.facebook.react.ReactPackage;
import com.facebook.react.bridge.NativeModule;
import com.facebook.react.bridge.ReactApplicationContext;
import com.facebook.react.uimanager.ViewManager;
import com.mrousavy.camera.frameprocessor.FrameProcessorPlugin;

import java.io.InputStream;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class ObjectDetectPluginPackage implements ReactPackage {

    @Override
    public List<ViewManager> createViewManagers(ReactApplicationContext reactContext) {
        return Collections.emptyList();
    }

    @Override
    public List<NativeModule> createNativeModules(
            ReactApplicationContext reactContext) {
        //Read and access the model resource file
        int modelID = R.raw.pre_validation;
        InputStream is = null;
        List<NativeModule> modules = new ArrayList<>();

        try {
            is = reactContext.getResources().openRawResource(modelID);
            byte[] yourModelModelByteArray = new byte[is.available()];
            is.read(yourModelModelByteArray);

            modules.add(new ObjectDetect(reactContext, yourModelModelByteArray));
        } catch (Exception e) {
            e.printStackTrace();
        }

        return modules;
    }

}

C. Register the package in MainApplication.java

@Override
      protected List<ReactPackage> getPackages() {
      @SuppressWarnings("UnnecessaryLocalVariable")
      List<ReactPackage> packages = new PackageList(this).getPackages();
      // ...
      packages.add(new ObjectDetectPluginPackage()); // <- add
      return packages;
}

D. Now you can create the frameProcessor plugin

import androidx.annotation.NonNull;
import androidx.annotation.Nullable;
import com.mrousavy.camera.frameprocessor.Frame;
import com.mrousavy.camera.frameprocessor.FrameProcessorPlugin;

public class ObjectDetect extends FrameProcessorPlugin {
  ObjectDetect(@Nullable Map<String, Object> options) {}

  @Nullable
  @Override
  public Object callback(@NonNull Frame frame, @Nullable Map<String, Object> arguments) {
    // code goes here
    return null;
  }
}

E. Let’s modify this to achieve our requirements.

ImageProxy is the default format that the Vision Camera library in the Android Frame processor returns. You can convert this into Mat object. It will help you easily do the processing using image processing library such as OpenCV.

Here is the function to conver ImageProxy to Mat in Java.

public static Mat imageProxyToMat(ImageProxy imageProxy) {
        ImageProxy.PlaneProxy[] plane = imageProxy.getPlanes();
        ByteBuffer yBuffer = plane[0].getBuffer();
        ByteBuffer uBuffer = plane[1].getBuffer();
        ByteBuffer vBuffer = plane[2].getBuffer();

        int ySize = yBuffer.remaining();
        int uSize = uBuffer.remaining();
        int vSize = vBuffer.remaining();

        byte[] nv21 = new byte[ySize + uSize + vSize];

        yBuffer.get(nv21, 0, ySize);
        vBuffer.get(nv21, ySize, vSize);
        uBuffer.get(nv21, ySize + vSize, uSize);
        try {
            YuvImage yuvImage = new YuvImage(nv21, ImageFormat.NV21, imageProxy.getWidth(), imageProxy.getHeight(), null);
            ByteArrayOutputStream stream = new ByteArrayOutputStream(nv21.length);
            yuvImage.compressToJpeg(new android.graphics.Rect(0, 0, yuvImage.getWidth(), yuvImage.getHeight()), 90, stream);
            byte[] jpegData = stream.toByteArray();
            stream.close();

            BitmapFactory.Options options = new BitmapFactory.Options();
            options.inPreferredConfig = Bitmap.Config.RGB_565;
            Bitmap bitmap = BitmapFactory.decodeByteArray(jpegData, 0, jpegData.length, options);

            Matrix matrix = new Matrix();
            matrix.postRotate(90);
            Bitmap rotatedBitmap = Bitmap.createBitmap(bitmap, 0, 0, bitmap.getWidth(), bitmap.getHeight(), matrix, true);

            Mat mat = new Mat(rotatedBitmap.getHeight(), rotatedBitmap.getWidth(), CvType.CV_8UC3);
            Utils.bitmapToMat(rotatedBitmap, mat);

            // Convert the image to RGB format
            Imgproc.cvtColor(mat, mat, Imgproc.COLOR_BGR2RGB);

            return mat;
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
  }

If there are arguments passed from the JS side you can access those by

Object param = params[0];
String paramString = String.valueOf(param);
Gson gson = new Gson();
String[] paramStringArray = gson.fromJson(paramString, String[].class);
ReadableNativeArray anyParameters= (ReadableNativeArray) params[0];

Here is the full code for the frame processor plugin. If you wonder about the “buffer” value. it’s just the input for model, You have to create this buffer value using the Mat object which you have converted usingimage Proxy.


import androidx.camera.core.ImageProxy;

import com.facebook.react.bridge.ReadableNativeArray;
import com.facebook.react.bridge.WritableNativeMap;
import com.google.gson.Gson;
import com.mrousavy.camera.frameprocessor.FrameProcessorPlugin;

import org.json.JSONObject;
import org.opencv.core.Mat;

import java.lang.reflect.Array;
import java.util.Arrays;

import ai.onnxruntime.OrtEnvironment;
import ai.onnxruntime.OrtException;
import ai.onnxruntime.OrtSession;

//NOTE::Frame processor plugins don't not like to execute repetitive code.
public class ObjectDetect extends FrameProcessorPlugin {
  private byte[] yourModelByteArray;
  private OrtSession ortSession;

  @Override
  public Object callback(ImageProxy image, Object[] params) {
    //you can pass arguments to the frame processor plugin
    Object param = params[0];
    String paramString = String.valueOf(param);
    Gson gson = new Gson();
    String[] paramStringArray = gson.fromJson(paramString, String[].class);
    ReadableNativeArray anyParameters= (ReadableNativeArray) params[0];

    //You might have to convert imageProxy into Mat object in
    Mat mat = OpenCV.imageProxyToMat(image);

    //final result which send back to JS side
    WritableNativeMap result = new WritableNativeMap();

    //you can use ORTsession in order to do you predictions
    OrtEnvironment env = OrtEnvironment.getEnvironment();
    String inputName = ortSession.getInputNames().iterator().next();
     
    //this buffer is the input for your model, it's depend on your model and the requirements
    OnnxTensor inputTensor = OnnxTensor.createTensor(env, buffer, new long[]{1, 3, height, width});//this arguments depends on your onnx model
    OrtSession.Result output = ortSession.run(Collections.singletonMap(inputName, inputTensor));

    OnnxTensor outputTensor = (OnnxTensor) output.get(0);
    float[][] outputArray = (float[][]) outputTensor.getValue();

    //now you can use outputArray to create an output for JS side.
    //outputArray basically contains all your predictiosn from the model

    try {
      result.putString("key", "value");
    } catch (Exception e) {
      result.putString("errorKey", "value");
    }

    return result;
  }

  //set yourModelByteArray usign constructor which passed earlier
  public YourPlugin(byte[] yourModelByteArray) {
    super("YourPlugin");
    this.yourModelByteArray = yourModelByteArray;

    OrtEnvironment ortEnvironment = OrtEnvironment.getEnvironment();
    this.ortSession = this.createORTSession(ortEnvironment);
  }

  public OrtSession createORTSession(OrtEnvironment ortEnvironment) {
    try {
      return ortEnvironment.createSession(this.yourModelByteArray);
    } catch (OrtException e) {
      throw new RuntimeException(e);
    }
  }
}

ImageProxy is the default format that the Vision Camera library in the Android Frame processor returns. We must add the following to the app/build.gradle file’s dependencies section in order to add support for it

implementation 'androidx.camera:camera-core:1.1.0-beta02'

3. Create the frame processor plugin natively for iOS(Objective C)

A. First of all add you model as a resource file using xcode.

Go to “Build Phases” → “copy bundle resources” → press “+”

B. Now Create an Objective-C source file YourPlugin.mm or YourPlugin.m depending on your requirement Objective-C and Objective-C++ also you can use any name. Finally add the following code

#import <VisionCamera/FrameProcessorPlugin.h>
#import <VisionCamera/Frame.h>
#import <onnxruntime.h>

@interface YourPlugin : NSObject
@end

@implementation YourPlugin

static inline id yourPlugin(Frame* frame, NSArray* args) {
  CMSampleBufferRef buffer = frame.buffer;
  
  //You might have to convert frame into something usable such as mat object
  UIImage *image = [OpenCV toUIImage:buffer];
  cv::Mat cvMat = [OpenCV cvMatFromUIImage:image];
  
  //You might have to resize the mat object based on your model requirements
  cv::Mat resizedMat = [OpenCV resizeMat:cvMat width:864 height:480];
  
  //access arguments
  NSArray *innerArray = args[0];

  //access model
  NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"pre_validation" ofType:@"onnx"];

  //use ORT session 
  ORTEnv* ortEnv = [[ORTEnv alloc] initWithLoggingLevel:ORTLoggingLevelWarning error:error];
  if (!ortEnv || !modelPath) return nil;
  
  ORTSession* session = [[ORTSession alloc] initWithEnv:ortEnv modelPath:modelPath sessionOptions:nil error:error];
  if (!session) return nil;
  
  //this floatArray is the input for your model, it's depend on your model and the requirements
  NSMutableData *data = [[NSMutableData alloc] initWithBytes:floatArray length: 480 * 864 * 3 * sizeof(float)];
  ORTValue *aInputValue = [[ORTValue alloc] initWithTensorData:data
                                                   elementType:ORTTensorElementDataTypeFloat
                                                         shape:@[@1, @3, @480, @864]//this arguments depends on your onnx model
                                                         error:error];
  
  free(floatArray);
  
  if (!aInputValue) return nil;
  
  NSDictionary<NSString*, ORTValue*>* outputs = [session runWithInputs:@{@"images" : aInputValue}
                                                           outputNames:[NSSet setWithArray:@[ @"output" ]]
                                                            runOptions:nil
                                                                 error:error];
  if (!outputs) return nil;
  
  // After running the model, we will get the output.
  ORTValue* combinedDict = outputs[@"output"];

  //now you can use outputArray to create an output for JS side.
  //outputArray basically contains all your predictiosn from the model
  //Add results to combinedDict
  
  return combinedDict;
}

VISION_EXPORT_FRAME_PROCESSOR(yourPlugin)

@end

here is the equalant objective C function to convert UIImage to mat. UIImage is the default format that the Vision Camera library in the iOS Frame processor returns.

+ (cv::Mat) cvMatFromUIImage:(UIImage *)image
{
  CGColorSpaceRef colorSpace = CGImageGetColorSpace(image.CGImage);
  CGFloat cols = image.size.width;
  CGFloat rows = image.size.height;
  cv::Mat cvMat(rows, cols, CV_8UC4);
  CGContextRef contextRef = CGBitmapContextCreate(cvMat.data,
                                                  cols,
                                                  rows,
                                                  8,
                                                  cvMat.step[0],
                                                  colorSpace,
                                                  kCGImageAlphaNoneSkipLast |
                                                  kCGBitmapByteOrderDefault);
  CGContextDrawImage(contextRef, CGRectMake(0, 0, cols, rows), image.CGImage);
  CGContextRelease(contextRef);
  return cvMat;
}

Other steps are very similer to the android implementation.

4. Expose your Frame Processor Plugin to JS

To make the Frame Processor Plugin available to the Frame Processor create “yourPlugins.declaration.ts”

import { Frame } from 'react-native-vision-camera';

//NOTE::Please not that We can run any number of frame processors at the same time.
//But it’s appears frame processor doesn’t like to execute same exact calculations twice.
//For an example in Android we have to convert the ImageProxy to Mat object.
//If we do that in 2 different frame processors app will crash.
export function yourPlugin(frame: Frame, args: any): any {
  'worklet';
  return __yourPlugin(frame, args);
}

Frame Processors require react-native-worklets-core 0.2.0 or higher.

yarn add react-native-worklets-core

And add the plugin to your babel.config.js

module.exports = {
  plugins: [
    ['react-native-worklets-core/plugin'],
  ],
}

Use frame processor with RN vision camera component

const frameProcessor = useFrameProcessor(
    frame => {
      'worklet';

      if (isPassedPreValidations || isAudioPlaying) return;

      const { results1, results2 } = yourPlugin(frame, [
        arg1,
        arg2,
      ]);
      //DO anything with the results
      ,
    [deps],
  );

 <Camera
    ...
    frameProcessor={frameProcessor}
 />

Special points to be noticed.

1. We can run any number of frame processors at the same time.
But it’s appears frame processor doesn’t like to execute same exact calculations twice.
For an example in Android we have to convert the ImageProxy to Mat object. If we do that in 2 different frame processors app will crash.

2. Frame processor might not be able process all the frames captures by the camera. For an example if your camera capture 240fps video. it won’t be able process all the frames. But you don’t have to worry about dropping those frames. Frame processor will do it for you by default.

3. Since Frame Processors run in Worklets, you can directly use JS values such as React state which are readonly-copied into the Frame Processor:

4. For longer running processing, you can use runAsync(..) to run code asynchronously on a different Thread: