Trueface Tutorials: How to Cross Compile Popular Computer Vision C++ Frameworks for Raspberry Pi 4 (ARM32/ARM64)
Whether it be a robot mapping its surroundings, a turnstile gate running facial recognition for access control or a gaming machine in Las Vegas performing age verification, we here at Trueface have deployed many computer vision solutions at the edge. As opposed to cloud deployment where the data is sent over to servers for processing, edge deployment involves processing the data directly onboard the very hardware which is generating the data. This is advantageous as it reduces latency, requires no network connection, and keeps the potentially sensitive data localized to the device. Generally, these applications utilize lightweight hardware due to the cheaper cost, smaller form factor, and lower power consumption. ARM is one of the most popular processors used in these portable devices due to its low power consumption and reasonable performance.
Despite its advantages, developing for ARM processors can prove to be a challenge. An executable or a library compiled on an x86 device will not run on an ARM processor. This is because the code is compiled to machine code for that specific host architecture, and machine code is very different between most processor families. One workaround is to compile the code directly on the ARM device; however, this proves to be inefficient with large codebases due to the limited hardware resources of the device (OpenCV 4 can take four hours to compile on an RPi 3). Additionally, some devices such as bare metal embedded systems do not even have an operating system and thus compiling on the device is simply not an option. The solution? Cross compile the code. With cross-compiling, we use the toolchain of the target architecture (in our case, ARM) to compile on a more powerful device. This will produce executables and libraries which can be run on the target architecture.
In this tutorial, we will be cross-compiling for Raspberry Pi 3/4 which utilize an ARM processor. We will build a computer vision application for detecting facial bounding boxes and landmarks and make use of OpenCV for image loading and augmentation and NCNN for machine learning inference. We will also link the dependency libraries statically. The following tutorial is written usingUbuntu 18.04.3 LTS.
Just get to the point already!
This tutorial is written for beginners with minimal experience with cross-compiling and thus goes through each step in detail. If you want to download the completed code, you can do so here, or simply want the final executable tarballs to test, go here.
A note about operating systems:
Both Raspberry Pi 3 (1.4GHz 64-bit quad-core Broadcom Arm Cortex-A53
) and Raspberry Pi 4 (1.5GHz 64-bit quad-core ARM Cortex-A72
) contain 64-bit ARM processors. The processors can run in both a 32-bit execution state (AArch32
) and a 64-bit execution state ( AArch64
) with performance being the main differentiator. In a benchmark measuring the inference speed using a ResNet 100 architecture, AArch64
outperformed AArch32
by 40%.
Raspbian, the official Debian-based operating system optimized for Raspberry Pi, is a 32-bit Linux distribution. To run a 64-bit operating system on the Raspberry Pi, a modified version of Gentoo can be used. This Github page demonstrates how to install the 64-bit operating system on Raspberry Pi.
This tutorial will have instructions for cross-compiling for both AArch32
and AArch64
, so you can follow along no matter what operating system you have installed on your Raspberry Pi device.
Let’s get started
Start by creating a directory for your project. Within that directory, create the following directory structure:
.
├── 3rd_party_libs
├── images
├── models
└── src
Installing the ARM toolchains
Begin by downloading the arm toolchains.
If cross-compiling for AArch32
, run:
sudo apt-get install gcc-arm-linux-gnueabihf g++-arm-linux-gnueabihf
If cross-compiling for AArch64
, run:
sudo apt-get install gcc-aarch64-linux-gnu g++-aarch64-linux-gnu
Cross-compiling OpenCV 4
Go to the OpenCV Github page and download the latest version of the source to the 3rd_party_libs
directory. Once the download is complete, navigate into the opencv
root source directory and run the following cmake
commands corresponding to your target architecture:
A few things to notice: The CMAKE_TOOLCHAIN_FILE
option is used to point to the appropriate toolchain cmake file (notice how a different file is used for AArch32
and AArch64
). Additionally, BUILD_SHARED_LIBS
is set to off as we will link to the static libraries. Finally, many of the other options are set to OFF
as they are not required for our use case and it will speed up the build process.
Once the source code has built, the libraries will be installed to:
/PATH/TO/OPENCV/ROOT/build_aarch32/install_aarch32/usr/local/lib
or:
/PATH/TO/OPENCV/ROOT/build_aarch64/install_aarch64/usr/local/lib
The include files will be installed to:
/PATH/TO/OPENCV/ROOT/build_aarch32/install_aarch32/usr/local/include/opencv4
or:
/PATH/TO/OPENCV/ROOT/build_aarch64/install_aarch64/usr/local/include/opencv4
Cross-compiling NCNN
Go to the NCNN Github page and download the source code to the 3rd_party_libs
directory. Once the download is complete, navigate to the ncnn
root source directory and run the following cmake
commands corresponding to your target architecture:
Once again, theCMAKE_TOOLCHAIN_FILE
option is used to select the appropriate cmake
toolchain file. Additionally, the NCNN_DISABLE_RTTI
option needs to be set to OFF
for the reasons discussed in this Github issue.
Once the source code has built, libncnn.a
will be installed to:
/PATH/TO/NCNN/ROOT/build_aarch32/install/lib/
or:
/PATH/TO/NCNN/ROOT/build_aarch64/install/lib/
The include files will be installed to:
/PATH/TO/NCNN/ROOT/build_aarch32/install/include/ncnn
or:
/PATH/TO/NCNN/ROOT/build_aarch64/install/include/ncnn
Writing the landmark detection code
To detect the facial landmarks, we will use an open-source implementation of Multi-task Cascaded Convolution Networks (MTCNN). Go ahead and download this Github project. Only a few of the files from the project will be required, in particular: src/mtcnn/mtcnn.cpp
include/mtcnn.h
models/det1.param
models/det2.param
models/det3.param
models/det1.bin
models/det2.bin
models/det3.bin
. Move mtcnn.cpp
and mtcnn.h
into the src
directory of your project, and det1/2/3.param
and det1/2/3.bin
files into the models
directory of your project.
Take the time to also download a few sample images containing faces and place them in the images
directory of your project. Additionally, create a CMakeLists.txt
file in the root directory of your project, and a main.cpp
in the src
directory.
At this point, your project directory should look something like this:
.
├── 3rd_party_libs
│ ├── ncnn
│ └── opencv
├── CMakeLists.txt
├── images
│ ├── face.jpg
│ └── face.png
├── models
│ ├── det1.bin
│ ├── det1.param
│ ├── det2.bin
│ ├── det2.param
│ ├── det3.bin
│ └── det3.param
└── src
├── main.cpp
├── mtcnn.cpp
└── mtcnn.h
Next, open main.cpp
in your favorite text editor and copy the following code:
The sample code above loads images from a specified path and then uses MTCNN to determine the facial bounding boxes and landmarks. The bounding boxes and landmarks are drawn on the image then the image is saved to disk.
The following code is a sample CMakeLists.txt
when building for AArch32
:
If building for AArch64
, use the following CMakeLists.txt
:
Finally, from the root project directory, run the following commands:
mkdir build
cd build
cmake ..
make -j$(nproc)
Deployment
In order to deploy the software, we will need to copy over a few files to the device. In addition to the generated executable, we will require the model files as they are loaded at runtime, as well as some sample images on which we want to draw the landmarks.
Using the CMake
in the step above will copy over all the required files post-build to a dist
directory in the project root:
dist
├── bin
│ └── landmark_detection_aarch32/64
├── images
│ ├── face.jpg
│ └── face.png
└── models
├── det1.bin
├── det1.param
├── det2.bin
├── det2.param
├── det3.bin
└── det3.param
Now, we can go ahead and package the dist directory then transfer it to the Raspberry Pi. I suggest using the tar command: tar -czvf dist.tgz ./dist/
then using scp
to copy over the tarball.
Once the tarball has been extracted on the Raspberry Pi, navigate to the dist/bin
directory and run the executable. Use the command line arguments to specify the image path(s). Here is an example: ./landmark_detection_aarch64/32 ../images/face.jpg ../images/face.png
. The generated output images will have the facial bounding boxes and landmarks drawn on.
Our application is now ready to be deployed on an ARM device. As an extension, you can add these cross-compile steps as part of your CI/CD pipeline to ensure you always have an ARM build ready for when you need it!
If you have any questions or comments, please email us at sales@trueface.ai.