Trueface Tutorials: How to Cross Compile Popular Computer Vision C++ Frameworks for Raspberry Pi 4 (ARM32/ARM64)

Cyrus Behroozi
Trueface
Published in
7 min readDec 6, 2019

Whether it be a robot mapping its surroundings, a turnstile gate running facial recognition for access control or a gaming machine in Las Vegas performing age verification, we here at Trueface have deployed many computer vision solutions at the edge. As opposed to cloud deployment where the data is sent over to servers for processing, edge deployment involves processing the data directly onboard the very hardware which is generating the data. This is advantageous as it reduces latency, requires no network connection, and keeps the potentially sensitive data localized to the device. Generally, these applications utilize lightweight hardware due to the cheaper cost, smaller form factor, and lower power consumption. ARM is one of the most popular processors used in these portable devices due to its low power consumption and reasonable performance.

Despite its advantages, developing for ARM processors can prove to be a challenge. An executable or a library compiled on an x86 device will not run on an ARM processor. This is because the code is compiled to machine code for that specific host architecture, and machine code is very different between most processor families. One workaround is to compile the code directly on the ARM device; however, this proves to be inefficient with large codebases due to the limited hardware resources of the device (OpenCV 4 can take four hours to compile on an RPi 3). Additionally, some devices such as bare metal embedded systems do not even have an operating system and thus compiling on the device is simply not an option. The solution? Cross compile the code. With cross-compiling, we use the toolchain of the target architecture (in our case, ARM) to compile on a more powerful device. This will produce executables and libraries which can be run on the target architecture.

In this tutorial, we will be cross-compiling for Raspberry Pi 3/4 which utilize an ARM processor. We will build a computer vision application for detecting facial bounding boxes and landmarks and make use of OpenCV for image loading and augmentation and NCNN for machine learning inference. We will also link the dependency libraries statically. The following tutorial is written usingUbuntu 18.04.3 LTS.

Just get to the point already!

This tutorial is written for beginners with minimal experience with cross-compiling and thus goes through each step in detail. If you want to download the completed code, you can do so here, or simply want the final executable tarballs to test, go here.

A note about operating systems:

Both Raspberry Pi 3 (1.4GHz 64-bit quad-core Broadcom Arm Cortex-A53) and Raspberry Pi 4 (1.5GHz 64-bit quad-core ARM Cortex-A72) contain 64-bit ARM processors. The processors can run in both a 32-bit execution state (AArch32 ) and a 64-bit execution state ( AArch64) with performance being the main differentiator. In a benchmark measuring the inference speed using a ResNet 100 architecture, AArch64 outperformed AArch32 by 40%.

Raspbian, the official Debian-based operating system optimized for Raspberry Pi, is a 32-bit Linux distribution. To run a 64-bit operating system on the Raspberry Pi, a modified version of Gentoo can be used. This Github page demonstrates how to install the 64-bit operating system on Raspberry Pi.

This tutorial will have instructions for cross-compiling for both AArch32 and AArch64 , so you can follow along no matter what operating system you have installed on your Raspberry Pi device.

Let’s get started

Start by creating a directory for your project. Within that directory, create the following directory structure:

.
├── 3rd_party_libs
├── images
├── models
└── src

Installing the ARM toolchains

Begin by downloading the arm toolchains.

If cross-compiling for AArch32 , run:

sudo apt-get install gcc-arm-linux-gnueabihf g++-arm-linux-gnueabihf

If cross-compiling for AArch64 , run:

sudo apt-get install gcc-aarch64-linux-gnu g++-aarch64-linux-gnu

Cross-compiling OpenCV 4

Go to the OpenCV Github page and download the latest version of the source to the 3rd_party_libs directory. Once the download is complete, navigate into the opencv root source directory and run the following cmake commands corresponding to your target architecture:

A few things to notice: The CMAKE_TOOLCHAIN_FILE option is used to point to the appropriate toolchain cmake file (notice how a different file is used for AArch32 and AArch64 ). Additionally, BUILD_SHARED_LIBS is set to off as we will link to the static libraries. Finally, many of the other options are set to OFF as they are not required for our use case and it will speed up the build process.

Once the source code has built, the libraries will be installed to:

/PATH/TO/OPENCV/ROOT/build_aarch32/install_aarch32/usr/local/lib

or:

/PATH/TO/OPENCV/ROOT/build_aarch64/install_aarch64/usr/local/lib

The include files will be installed to:

/PATH/TO/OPENCV/ROOT/build_aarch32/install_aarch32/usr/local/include/opencv4

or:

/PATH/TO/OPENCV/ROOT/build_aarch64/install_aarch64/usr/local/include/opencv4

Cross-compiling NCNN

Go to the NCNN Github page and download the source code to the 3rd_party_libs directory. Once the download is complete, navigate to the ncnn root source directory and run the following cmake commands corresponding to your target architecture:

Once again, theCMAKE_TOOLCHAIN_FILE option is used to select the appropriate cmake toolchain file. Additionally, the NCNN_DISABLE_RTTI option needs to be set to OFF for the reasons discussed in this Github issue.

Once the source code has built, libncnn.a will be installed to:

/PATH/TO/NCNN/ROOT/build_aarch32/install/lib/

or:

/PATH/TO/NCNN/ROOT/build_aarch64/install/lib/

The include files will be installed to:

/PATH/TO/NCNN/ROOT/build_aarch32/install/include/ncnn

or:

/PATH/TO/NCNN/ROOT/build_aarch64/install/include/ncnn

Writing the landmark detection code

To detect the facial landmarks, we will use an open-source implementation of Multi-task Cascaded Convolution Networks (MTCNN). Go ahead and download this Github project. Only a few of the files from the project will be required, in particular: src/mtcnn/mtcnn.cpp include/mtcnn.h models/det1.param models/det2.param models/det3.param models/det1.bin models/det2.bin models/det3.bin . Move mtcnn.cpp and mtcnn.h into the src directory of your project, and det1/2/3.param and det1/2/3.bin files into the models directory of your project.

Take the time to also download a few sample images containing faces and place them in the images directory of your project. Additionally, create a CMakeLists.txt file in the root directory of your project, and a main.cpp in the src directory.

At this point, your project directory should look something like this:

.
├── 3rd_party_libs
│ ├── ncnn
│ └── opencv
├── CMakeLists.txt
├── images
│ ├── face.jpg
│ └── face.png
├── models
│ ├── det1.bin
│ ├── det1.param
│ ├── det2.bin
│ ├── det2.param
│ ├── det3.bin
│ └── det3.param
└── src
├── main.cpp
├── mtcnn.cpp
└── mtcnn.h

Next, open main.cpp in your favorite text editor and copy the following code:

The sample code above loads images from a specified path and then uses MTCNN to determine the facial bounding boxes and landmarks. The bounding boxes and landmarks are drawn on the image then the image is saved to disk.

The following code is a sample CMakeLists.txt when building for AArch32 :

If building for AArch64 , use the following CMakeLists.txt :

Finally, from the root project directory, run the following commands:

mkdir build
cd build
cmake ..
make -j$(nproc)

Deployment

In order to deploy the software, we will need to copy over a few files to the device. In addition to the generated executable, we will require the model files as they are loaded at runtime, as well as some sample images on which we want to draw the landmarks.

Using the CMake in the step above will copy over all the required files post-build to a dist directory in the project root:

dist
├── bin
│ └── landmark_detection_aarch32/64
├── images
│ ├── face.jpg
│ └── face.png
└── models
├── det1.bin
├── det1.param
├── det2.bin
├── det2.param
├── det3.bin
└── det3.param

Now, we can go ahead and package the dist directory then transfer it to the Raspberry Pi. I suggest using the tar command: tar -czvf dist.tgz ./dist/ then using scp to copy over the tarball.

Once the tarball has been extracted on the Raspberry Pi, navigate to the dist/bin directory and run the executable. Use the command line arguments to specify the image path(s). Here is an example: ./landmark_detection_aarch64/32 ../images/face.jpg ../images/face.png . The generated output images will have the facial bounding boxes and landmarks drawn on.

Our application is now ready to be deployed on an ARM device. As an extension, you can add these cross-compile steps as part of your CI/CD pipeline to ensure you always have an ARM build ready for when you need it!

If you have any questions or comments, please email us at sales@trueface.ai.

--

--