How to Design a Language-Agnostic Cross-Platform Computer Vision SDK: A Hands-On Tutorial

Published in

Trueface

15 min readOct 21, 2020

I recently had the opportunity to present at the Venice Computer Vision meetup. If you are not familiar, it is an event sponsored by Trueface where computer vision developers and enthusiasts alike can showcase cutting edge computer vision research, applications, and hands-on tutorials.

In this article, I’ll be going over my tutorial presentation on how to design a language-agnostic computer vision software developer kit (SDK) for cross-platform deployment and maximum extensibility. If you would like to view the live recording of the presentation, you can do so here. I have also made the entire project open source, so feel free to use it as a template for your next computer vision project.

cyrusbehr/sdk_design

How to Design a Language Agnostic SDK for Cross Platform Deployment and Maximum Extensibility. A Venice Computer Vision…

github.com

Why this Tutorial Matters

In my experience, I never found an all-encompassing guide that summarizes all the pertinent steps needed to create a language-agnostic, cross-platform SDK. I had to comb through disparate documentation for just the right bits of information, learn each component separately, and then piecemeal it all together myself. It was frustrating. It took a lot of time. And now you, dear reader, get to benefit from all of my work. Ahead you’ll learn how to build a language-agnostic, cross-platform SDK. All the essentials are there. None of the fluff, save a few memes. Enjoy.

In this tutorial, you can expect to learn how to:

Build a basic computer vision library in C++
Compile and cross-compile the library for AMD64, ARM64, and ARM32
Package the library and all the dependencies as a single static library
Automate unit testing
Set up a continuous integration (CI) pipeline
Write python bindings for our library
Generate documentation directly from our API

For the sake of this demo, we will build a face and landmark detection SDK using an open-source face detector called MTCNN.

Example of face bounding boxes and facial landmarks

Our API function will take an image path then return the coordinates of the facial bounding box and facial landmarks. The ability to detect faces is very useful in computer vision as it is the first step in many pipelines including face recognition, age prediction, and automated face blurring.

Note: For this tutorial, I will be working on Ubuntu 18.04.

Why use C++ for our Library?

Running efficient C++ code can feel like this

The majority of our library will be written in C++, a compiled and statically typed language. It’s no secret that C++ is a very fast programming language; it’s low level enough to give us the speed we desire and has minimal added runtime overhead.

In computer vision applications, we generally manipulate lots of images, perform matrix operations, run machine learning inference, all of which involve a massive amount of computing. Execution speed is therefore critical. This is especially important in real-time applications where you need to reduce latency in order to achieve a desired frame rate — often we only have milliseconds to run all our code.

Another advantage of C++ is that if we compile for a certain architecture and link all the dependencies statically, then we can run it on that hardware without requiring any additional interpreters or libraries. Believe it or not, we can even run on a bare metal embedded device with no operating system!

Directory Structure

We will be using the following directory structure for our project.

3rdparty will contain the 3rd party dependency libraries required by our project.

dist will contain the files which get distributed to the end-users of the SDK. In our case, that will be the library itself, and the associated header file.

docker will contain the docker file which will be used to generate a docker image for the CI builds.

docs will contain the build scripts required to generate documentation directly from our header file.

include will contain any include files for the public API.

models will contain the face detection deep learning model files.

python will contain the code required to generate python bindings.

src will contain any cpp files that will be compiled, and also any header files which will not be distributed with the SDK (internal header files).

test will contain our unit tests.

tools will contain our CMake toolchain files required for cross-compiling.

Installing the Dependency Libraries

For this project, the 3rd party dependency libraries which are required are ncnn, a lightweight machine learning inference library, OpenCV, an image augmentation library, Catch2, a unit testing library, and finally pybind11, a library used for generating python bindings. The first two libraries will need to be compiled as standalone libraries, whereas the latter two are header only and therefore we only require the source.

One way to add these libraries to our projects is via git sub-modules. Although this approach works, I’m personally a fan of using shell scripts which pull the source code then build for the desired platforms: in our case AMD64, ARM32, and ARM64.

Here’s an example of what one of these build scripts looks like:

The script is pretty straightforward. It starts by pulling the desired release source code from the git repository. Next, CMake is used to prepare the build, then make is invoked to drive the compiler to build the source code.

What you will notice is that the main difference between the AMD64 build and the ARM builds is that the ARM builds pass an additional CMake parameter called CMAKE_TOOLCHAIN_FILE . This argument is used to specify to CMake that the build target architecture (ARM32 orARM64) is different from the host architecture (AMD64 / x86_64). CMake is therefore instructed to use the cross compiler specified within the selected toolchain file to build the library (more on toolchain files later in this tutorial). In order for this shell script to work, you will have to have the appropriate cross compilers installed on your Ubuntu machine. These can be installed easily using apt-get and instructions on how to do so are shown here.

Our Library API

Our library API looks like this:

Since I’m super creative, I decided to name my SDK MySDK . In our API, we have an enum called ErrorCode , we have a struct called Point , and finally, we have one public member function called getFaceBoxAndLandmarks . For the scope of this tutorial, I will not be going into details of the implementation of the SDK. The gist is that we read the image into memory using OpenCV then perform machine learning inference using ncnn with open source models to detect the face bounding box and landmarks. If you would like to dive into the implementation, you can do so here.

What I want you to pay attention to though is the design pattern we are using. We are using a technique called Pointer to implementation, or pImpl for short, which basically removes the implementation details of a class by placing them in a separate class. In the code above, this is achieved by forward declaring the Impl class, then having a unique_ptr to this class as a private member variable. In doing so, not only do we hide the implementation from the prying eyes of the end-user (which can be quite important in a commercial SDK), but we also reduce the number of headers our API header depends on (and thus prevent our API header from #includeing dependency library headers).

A Note on Model Files

I said we weren’t going to go over the details of the implementation, but there is something that I think there is worth mentioning. By default, the open source face detector we are using, called MTCNN, loads the machine learning model files at runtime. This isn’t ideal because it means we will need to distribute the models to the end-user. This issue is even more significant with commercial models where you don’t want users to have free access to these model files (think of the countless hours that went into training these models). One solution is to encrypt these models’ files, which I absolutely advise doing. However, this still means we need to ship the model files along with the SDK. Ultimately, we want to reduce the number of files we send a user to make it easier for them to use our software (fewer files equals fewer places to go wrong). We can therefore use the method shown below to convert the model files into header files and actually embed them into the SDK itself.

The xxd bash command is used for generating hex dumps and can be used to generate a header file from a binary file. We can therefore include the model files in our code like normal header files and load them directly from memory. A limitation of this approach is that it isn’t practical with very large model files as it consumes too much memory at compile time. Instead, you can use a tool such as ld to convert these large model files directly to object files.

CMake and Compiling our Library

We can now use CMake to generate the build files for our project. In case you are not familiar, CMake is a build system generator used to manage the build process. Below, you’ll see what part of the root CMakeLists.txt (CMake file) looks like.

Basically, we create a static library called my_sdk_static with the two source files which contain our implementation, my_sdk.cpp and mtcnn.cpp . The reason we are creating a static library is that, in my experience, it is easier to distribute a static library to users and it is more friendly towards embedded devices. As I mentioned above, if an executable is linked against a static library, it can be run on an embedded device that doesn’t even have an operating system. This simply is not possible with a dynamic library. Additionally, with dynamic libraries, we have to worry about dependency versions. We might even need a manifest file associated with our library. Statically linked libraries also have a slightly better performance profile than their dynamic counterparts.

The next thing we do in our CMake script is tell CMake where to find the necessary include header files that our source files require. Something to note: although our library will compile at this point, when we try to link against our library (with an executable for example), we will get an absolute ton of undefined reference to symbol errors. This is because we have not linked any of our dependency libraries. So if we did want to successfully link an executable against libmy_sdk_static.a , then we would have to track down and link all the dependency libraries too (OpenCV modules, ncnn, etc). Unlike dynamic libraries, static libraries can’t resolve their own dependencies. They are basically just a collection of object files packaged into an archive.

Later in this tutorial, I’ll demonstrate how we can bundle all the dependency libraries into our static library so the user won’t need to worry about linking against any of the dependency libraries.

Cross-compiling our Library and Toolchain Files

Many computer vision applications are deployed at the edge. This generally involves running the code on low-power embedded devices which usually have ARM CPUs. Since C++ is a compiled language, we must compile our code for the CPU architecture on which the application will be run (each architecture uses different assembly instructions).

Before we dive into it, let’s also touch on the difference between ARM32 and ARM64, also called AArch32 and AArch64. AArch64 refers to the 64-bit extension of the ARM architecture and is both CPU and operating system dependent. So for example, even though the Raspberry Pi 4 has a 64 bit ARM CPU, the default operating system Raspbian is 32 bit. Therefore such a device requires an AArch32 compiled binary. If we were to run a 64-bit operating system such as Gentoo on this Pi device, then we would require an AArch64 compiled binary. Another example of a popular embedded device is the NVIDIA Jetson which has an onboard GPU and runs AArch64.

In order to cross-compile, we need to specify to CMake that we are not compiling for the architecture of the machine we are currently building on. Therefore, we need to specify the cross compiler which CMake should use. For AArch64, we use the aarch64-linux-gnu-g++ compiler, and for AArch32 we use the arm-linux-gnuebhif-g++ compiler (hf stands for hard float).

The following is an example of a toolchain file. As you can see, we are specifying to use the AArch64 cross compiler.

Back at our root CMakeLists.txt , we can add the following code to the top of the file.

Basically, we are adding CMake options which can be enabled from the command line in order to cross-compile. Enabling either the BUILD_ARM32 or BUILD_ARM64 options will select the appropriate toolchain file and configure the build for a cross-compilation.

Packaging our SDK with Dependency Libraries

As mentioned earlier, if a developer wants to link against our library at this point, they will need to also link against all the dependency libraries in order to resolve all symbols from dependency libraries. Even though our app is pretty simple, we already have eight dependency libraries! The first is ncnn, then we have three OpenCV module libraries, then we have four utility libraries that were built with OpenCV (libjpeg, libpng, zlib, libtiff). We could require the user to build the dependency libraries themselves or even ship them alongside our library, but ultimately that takes more work for the user and we are all about lowering the barrier for use. The ideal situation is if we can ship the user a single library that contains our library along with all of the 3rd party dependency libraries other than the standard system libraries. It turns out we can achieve this using some CMake magic.

We first add a custom target to our CMakeLists.txt , then execute what is called a MRI script. This MRI script gets passed to the ar -M bash command, which basically combines all the static libraries into a single archive. What’s cool about this method is that it will gracefully handle overlapping member names from the original archives, so we don’t need to worry about conflicts there. Building this custom target will produce libmy_sdk.a which will contain our SDK along with all the dependency archives.

Hold up for a second: Let’s take stock of what we’ve done so far.

Take a breath. Grab a snack. Call your mom.

At this point, we have a static library called libmy_sdk.a which contains our SDK and all the dependency libraries, which we have packaged into a single archive. We also have the ability to compile and cross-compile (using command-line arguments) for all our target platforms.

Unit Tests

I probably don’t need to explain why unit tests are important, but basically, they are a crucial part of SDK design that allows the developer to ensure the SDK is working as indented. Additionally, if any breaking changes are made down the line, it helps to track them down and push out fixes faster.

In this specific case, creating a unit test executable also gives us an opportunity to link against the combined library we just created to ensure that we can link correctly as intended (and we don’t get any of those nasty undefined reference-to-symbol errors).

We are using Catch2 as our unit testing framework. The syntax is outlined below:

How Catch2 works is that we have this macro called TEST_CASE and another macro called SECTION . For each SECTION , the TEST_CASE is executed from the start. So in our example, mySdk will first be initialized, then the first section named “Non face image” will be run. Next, mySdk will be deconstructed before being reconstructed, then the second section named “Faces in image” will run. This is great because it ensures that we have a fresh MySDK object to operate on for each section. We can then use macros such as REQUIRE to make our assertions.

We can use CMake to build out a unit testing executable called run_tests. As we can see in the call to target_link_libraries on line 3 below, the only library we need to link against is our libmy_sdk.a and no other dependency libraries.

Documentation

We will use doxygen to generate documentation directly from our header file. We can go ahead and document all our methods and data types in our public header using the syntax shown in the code snippet below. Be sure to specify all input and output parameters for any functions.

In order to actually generate documentation, we need something called a doxyfile which is basically a blueprint for instructing doxygen how to generate the documentation. We can generate a generic doxyfile by running doxygen -g in our terminal, assuming you have doxygen installed on your system. Next, we can edit the doxyfile. At a minimum, we need to specify the output directory and also the input files.

In our case, we only want to generate documentation from our API header file, which is why we have specified the include directory. Finally, you use CMake to actually build the documentation, which can be done like so.

Python Bindings

Are you tired of seeing semi-relevant gifs yet? Yeah, me neither.

Let’s be honest. C++ isn’t the easiest or most friendly language to develop in. Therefore, we want to extend our library to support language bindings to make it easier to use for developers. I’ll be demonstrating this using python as it’s a popular computer vision prototyping language, but other language bindings are just as easy to write. We are using pybind11 to achieve this:

We start by using the PYBIND11_MODULE macro which creates a function that will be called when an import statement is issued from within python. So in the example above, the python module name is mysdk . Next, we are able to define our classes and their members using pybind11 syntax.

Here’s something to note: In C++, it’s pretty common to pass variables using mutable reference which allows both read and write access. This is exactly what we have done with our API member function with the faceDetected and fbAndLandmarks parameters. In python, all arguments are passed by reference. However, certain basic python types are immutable, including bool . Coincidentally, our faceDetected parameter is a bool that is passed by mutable reference. We must therefore use the workaround shown in the code above on lines 31 to 34 where we define the bool within our python wrapper function, then pass it to our C++ function before returning the variable as part of a tuple.

Once we have built the python bindings library, we can easily utilize it using the code below:

Continuous Integration

For our continuous integration pipeline, we will be using a tool called CircleCI which I really like because it integrates directly with Github. A new build will automatically be triggered every time you push a commit. To get started, go to the CircleCI website and connect it to your Github account, then select the project you want to add. Once added, you will need to create a .circleci directory at the root of your project and create a file called config.yml within that directory.

For anyone who is not familiar, YAML is a serialization language commonly used for config files. We can use it to instruct what operations we want CircleCI to perform. In the YAML snippet below, you can see how we first build one of the dependency libraries, next build the SDK itself, and finally build and run the unit tests.

If we are intelligent (and I assume you are if you’ve made it this far), we can use caching to significantly reduce the build times. For example, in the YAML above, we cache the OpenCV build using the hash of the build script as the cache key. This way, the OpenCV library will only be rebuilt if the build script has been modified — otherwise, the cached build will be used. Another thing to note is that we are running the build inside of a docker image of our choice. I’ve selected a custom docker image (here is the Dockerfile) in which I’ve installed all the system dependencies.

And there you have it. Like any well-designed product, we want to support the most in-demand platforms and make it easy to use for the largest number of developers. Using the tutorial above, we have built an SDK that is accessible in several languages and is deployable across multiple platforms. And you didn’t even have to read the documentation for pybind11 yourself. I hope you’ve found this tutorial useful and entertaining. Happy building.