Live computer vision with OpenCV on mobiles

Published in

Onfido Product and Tech

12 min readJan 3, 2018

Solving problems and making a solution immediately ready for our clients is a big part of our day to day work. In this blog post, written by me and Zhiyuan Shi, we share a quick story about how we discovered a problem, researched it, solved it with a prototype algorithm, and finally connected and deployed this algorithm to production devices across multiple platforms.

In general, the whole process involves four steps:

Discovering the problem
Researching and solving the problem by prototyping (in Python)
Optimising and finalising the algorithm (in C++)
Connecting and deploying the algorithm to devices.

Discovering the problem is what drives us to design a better solution. The following figure illustrates a common problem when a user takes a photo of their document. Glare accidentally appears in the camera spoiling both face recognition and verification. Therefore, helping the user detect the glare on the document would improve capture experience and overall pass rate on Identity Verification (IDV).

Glare on the document makes it unprocessable

Researching and solving the problem aims to propose an optimal and feasible solution to the task, while iterating over prototypes closely with the design team to gain usability insights (see https://medium.com/design-onfido/glare-detection-our-journey-to-help-users-take-higher-quality-photos-9bf656e6d304 by our product designers). Python is an extremely powerful language for rapid prototyping and proof of concept. We adopt Python to implement candidate algorithms benefiting from the ease-of-use of various statistical, machine learning and numerical libraries such as NumPy, Scikit-Learn, TensorFlow, SciPy, Matplotlib, etc. After some efforts on preliminary explorations, there are two main paradigms to tackle this challenge. We can either treat it as a binary classification problem where classifiers can be trained with a Deep Neural Network, Support Vector Machine, Random Forest, etc. or solve it with low-level computer vision and image processing techniques. Considering that the first way requires a massive amount of training data and human annotations, we adopt the second idea to build the initial solution.

In low-level computer vision, detecting glare is actually equivalent to finding high intensity regions. Let’s assume we have some piece of code of basic image process operations to detect glare as follows:

Optimising and finalising the algorithm (in C++) making the deployment become tractable on mobile devices, especially across multiple platforms. Below, we follow similar steps in python version and reimplement the algorithm in C++. Most of the code can be directly transferred by applying slightly different declaration, parameter feeding. To generate the executable, you need to compile the project using CMake with a CMakeList file.

Connecting and deploying the algorithm to devices

After the code is made available on a Git repository, it is ready to be consumed by the Android SDK. To make that happen, we chose to use Git Submodules, which allows you to keep a Git repository as a subdirectory of another Git repository. This way, we can include the native code in our project while keeping the commits separate. We chose to create a new jni folder under main where our native code repositories were cloned to.

Project structure after submodules clone

Compiling the code

Both these submodules are written in C++, and using cmake as the build tool, which is an advantage since it’s one of the two supported build tools for native libraries (along with ndk-build) and also Android Studio’s default one, making the integration as easy as possible.

To setup cmake and other needed dependencies, you can access Android Studio’s SDK Manager, and under the SDK Tools tab, download three of them: CMake, LLDB (debugger) and NDK (Native Development Kit).

At this time, you might be wondering why we need to actually compile the native code along with our Android code, instead of just including the pre-compiled library and use it out-of-the-box 🤔 . The answer is that Android environment comprises a multitude of devices, with a wide range of CPUs, which in turn support different instruction sets 😵. For each combination of CPU and instruction set, there is one ABI (Application Binary Interface). The ABI controls, precisely, how an application’s machine code should interoperate with the system at runtime. Actually, in your project you must specify one ABI for each CPU architecture that you want your code to run on, which is made in our case under the build.gradle file on the SDK’s module. So, in order to support different ABIs, the native code must be compiled once for each target that you want to support.

android {
     ...defaultConfig {
        ...

        externalNativeBuild {
            cmake {
                abiFilters 'armeabi-v7a', 'arm64-v8a', 'x86',
'x86_64'
              
            }
        }
}externalNativeBuild {
     cmake {
         path 'src/main/jni/CMakeLists.txt'
              
      }
  }
}

In this case, we want to support 4 different ABIs:

armeabi-v7a: Version 7+ of the ARM processor. Most recent Android phones use this;
arm64-v8a: 64-bit ARM processors. Found on high-end devices;
x86: Most tablets and emulators;
x86_64: Used by 64-bit tablets.

After defining our target ABIs, it’s time to define our CMakeLists.txt, which is basically a Makefile for CMake. Once again, we should add a block to our build.gradle file (outside of the defaultConfig one), with its location.

android {
     ...

        externalNativeBuild {
            cmake {
                path 'src/main/jni/CMakeLists.txt'
              
            }
        }
    }

An example of the CMakeLists.txt file is presented below. It should define the minimum cmake version to be used, along with some other properties, required to link our submodules with the project.

project(NativeBridge) defines our integration module’s name;
add_subdirectory imports the sources from any submodule into the integration module we are defining;
Since our project heavily depends on OpenCV, an external tool which offers great image processing functionalities, we add the find_package instruction with the REQUIRED flag, which will cause our build to fail in case the package is not found by the compiler.
add_library defines the name and main file of a new library we create to specifically bridge our Android and submodule code (where the magic happens 🔮). Also, we use target_link_libraries to link the external submodule with this library, telling the compiler that the submodule is indeed a dependency of our integration code, thus should be compiled before (recursively looking for the CMakeLists.txt of each of these dependencies).

cmake_minimum_required(VERSION 3.6)
project(NativeBridge)
add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/glare_detector)
find_package(OpenCV REQUIRED)
add_library(GlareBridge SHARED GlareBridge.cpp)
target_link_libraries(GlareBridge GlareDetector)

So after these steps our project structure became something like the figure below, and it is now time to use JNI (Java Native Interface) to implement the bridging code which allows the Java code to interact with the C++ algorithm.

Building the bridge

Our algorithm was made available by the research team and can be used as a function detect() , part of the GlareDetector class, which receives the binary of an image coming from the mobile device’s camera and returns a boolean on whether the image contains glare or not.

boolean hasGlare = glareDetector.detect(image);

So to make the algorithm available to the Java code, we need to wrap this call inside a JNI function, in this case inside a file called GlareBridge.cpp (the library we declared before). Also, since the camera frames come in full-size and not only containing our region of interest (the document part itself), we will need to pass some more parameters like the frame dimensions and both the position and dimensions of the region of interest (our document rectangle, relative to the origin of the frame).

Needed dimensions for accurate region of interest

The picture above represents the JNI header for the function we want to build, so let’s break it down:

JNIEXPORT and JNICALL are two keywords used to allow the function to be located in the shared library at runtime and are required;
jboolean is the JNI equivalent to boolean in Java, which will be our return type;
Java_com_onfido_android_sdk_capture_native_1detector_NativeDetector_detectGlare is the (long) function name, because JNI requires us to start it with Java_ and then write the package and name of the class that will actually declare its correspondent native Java method, along with the same method’s name;
The JNIEnv *env parameter is a JNI class instance which exposes the JNI functions as member functions. It is used for native code to access the virtual machine functionality. The jobject parameter indicates that the method we are calling is an instance method, and the parameter itself is the instance on which the method was called. It must be replaced by a jclass parameter if we want to define a native static method instead;
The rest of the parameters refer to the dimensions specified above.

Since our algorithm is written in C++, we need to pass C++ parameters to it. Fortunately, Java primitive types are directly mapped to JNI primitives, which in their turn are automatically mapped into their C++ equivalents, leveraging the integration effort.

Note: A header file under the same name pattern must be created for to avoiding C++ compiler-specific name mangling of the native methods. In this case, a com_onfido_android_sdk_capture_naive_detector_NativeDetector.h was created with the following content:

extern "C" {JNIEXPORT jboolean JNICALL Java_com_onfido_android_sdk_capture_native_1detector_NativeDetector_detectGlare(JNIEnv *, jobject, jbyteArray, jint, jint, jint, jint, jint, jint);

}

OpenCV

Our algorithm, developed by the research team, is using OpenCV (https://opencv.org/), an open source computer vision library used to manipulate images, necessary to perform the algorithm steps on the camera frames. Also, it allows us to perform image decoding and cropping, which is useful for our bridge code. It has Java/ObjC mobile SDKs but for performance reasons we chose to use the C++ SDK.

This said, our bridge code implementation works as follows:

Decoding the camera frame: On Android camera API, frames are accessed through a callback that returns images using the YUV420 color space. However, the algorithm expects the BGRA color space instead, which means we have to decode our image data and then convert it to BGRA.

// Decode and convert image from YUV420 to BGRA
Mat yuv(height, width, CV_8UC1, imageData); 
Mat bgra(height, width, CV_8UC4);

cvtColor(myuv, mbgra, CV_YUV420sp2BGRA, 4);

Since OpenCV uses Mat objects to represent images, we need to create two of them (one for decoding the data and another for the conversion). This way, the yuv matrix will be used to decode the imageData and the bgra to hold the final result after conversion, which is achieved using the cvtColor function from OpenCV.

Cropping the region of interest: After that, we need to crop the area we want to evaluate with the algorithm. To do that, we can use the roi object and apply it to the image Mat .

// Apply region of interest
cv::Rect roi(rectLeft, rectTop, rectWidth, rectHeight);
Mat cropped = mbgra(roi);

Applying the algorithm: Then, we call the detect() function provided by the research team, with the cropped image as argument.

// Execute algorithm
boolean hasGlare = detect(cropped);

Releasing allocated memory and returning to Java: After executing the algorithm, we release the image byte array reference and return the glare result to be evaluated back on the Java side.

// Release byte array memory usage
env->ReleaseByteArrayElements(array, _yuv, 0);

return hasGlare;

Back to Java

In order to call the native libraries’ methods from Java, we need to load these libraries, making use of the System.loadLibrary() call and the native keyword on the method declaration, stating that the implementation of this method is not present because it will be provided by a library loaded at runtime.

From now on, new NativeDetector().detectGlare(byte[] image) will execute the algorithm and return the boolean result we explained above.

Putting it in action

The end goal is to periodically run the algorithm and present the results to the user in a visual UI, through some UI element. However, the Android camera API doesn’t have a frame callback with dynamic periodicity, but one for every captured frame instead.

Theoretically, we could run it on every frame, but on any simple 30-fps camera it would become impossible to present this information in an understandable and eye-friendly way, given the high sample rate. This way, we had to find a way to control the camera frame rate.

RxJava: Subjects and backpressure

Since our use case can be well modelled by a sequence of observable frames, we chose to use the benefits of reactive programming, and concretely RxAndroid, a set of RxJava bindings for Android. This library implements and extends the observer pattern and offers a set of operators for composing asynchronous and event-based programs. From all its features, we were really interested in the backpressure handling, to solve the camera frame rate problem, and asynchronous computation, because on Android development the main thread should be relieved from processing heavy tasks that can be done on any other worker thread and mainly focus on rendering what we see on the screen. Given that, our choices were as follows:

PublishSubject: First of all, we needed an Observable<FrameData> to start with. Since our camera provides us a callback with every frame coming from it, we just needed to transform it into an Observable. To achieve that, we chose to use a PublishSubject , which acts as a bridge between our Android frame callback and the stream in which we want to apply some backpressure and execute the algorithm asynchronously. Because it is an Observer, it can subscribe to one or more Observables, and because it is an Observable, it can pass through the items it observes by reemitting them. So on each new frame available, we would call the subject’s onNext(FrameData frame) method, making it reemit it.

After that, the subject can also be subscribed to, in order to receive the reemitted items.

The sample() operator: In our use-case, we want to periodically evaluate the latest frame emitted by the camera, which means that we can safely ignore all the others in between this period. For that, we apply the sample operator to our Observable<FrameData> with the desired periodicity and time unit.

frameData.sample(350L, TimeUnit.MILLISECONDS)

Executing the algorithm: On RxJava, the map() operator allows us to transform the type of our streams along it. In this case, we want to execute the algorithm with the received frame and proceed with the execution results downstream. This occurs as follows:

Subscribing to it: After this, we will end up with an Observable<Boolean> which will tell us whether each frame has glare on it or not, and finally we just need to subscribe to it, choosing the Scheduler in which we want to perform the background work on, where we want the results to be observed and what should we do when each item is emitted. That said, the following code completes our frame processing subscription:

Since we want to update our UI on every item emitted (alerting the user that glare was detected if such or hiding this alert if not), results must be observed on the AndroidSchedulers.mainThread() , which is the only thread allowed to touch the UI on the Android framework. Finally, we defined some lambdas (thanks Kotlin!) to specify what should happen on every item emitted (which is passing this result to the view), or in case any error occurs along the stream (logging the error cause).

Ok, so cool, but what do I see? 🤔

In the end, we present the user an inflation animation of a view which looks like a speaking balloon, to catch the user’s attention. This animation takes ~300 ms, which is Android guidelines recommended time for an animation medium duration. Also, for the case where glare was detected but solved, the animation will reverse, making the balloon disappear. Anyway, nothing like checking it with your own eyes.

We hope you now have an idea of how we apply fast prototyping to solve challenges combining computer vision research and mobile development. Also, this post intends to show how useful JNI can be to perform costly tasks like image manipulation on a mobile environment, and also as an invitation for every developer to give it a try.

That’s it. See you next time!