Building ARM64-based Docker containers for NVIDIA Jetson devices on an x86-based host.

SmartCow
8 min readNov 19, 2021

While the Jetson range of AI-enabled edge devices from NVIDIA are immensely powerful for their intended purpose, it is often faster and more expedient to perform development-related tasks like compilation and creation of Docker containers on a fully-fledged workstation and only transfer the resultant artifacts onto the device when they are ready to be tested or deployed.

The NVIDIA Jetson NX we are using for this tutorial (Photo: SmartCow)

Unfortunately, this can be tricky.

There are two main problems to circumvent. The first is that a development workstation running Ubuntu, in the overwhelming majority of situations, is based on an Intel x86 architecture. This is radically different from the ARM64 architecture of the Jetson range of devices and, without taking some extra steps, software built on the x86 machine will not be able to run on an ARM64 one.

The second problem is that a standard Ubuntu installation lacks crucial system components supplied with Linux4Tegra (L4T), the custom Ubuntu distribution running on the Jetson, making it impossible to build software that needs to make use of such components.

The good news is that it is possible to get around both hurdles by leveraging the flexibility of the Linux kernel together with the magic of Docker containers!

In this tutorial we will set up an Ubuntu x86 host to enable us to run ARM64-based docker containers. We will then compile a Jetson sample Deepstream application and package it into a Docker container. We will then transfer the container to the target Jetson device and verify that our sample application runs correctly on the Jetson.

Running ARM64 containers on x86

ARM microprocessors have a very different instruction set that x86 processors. This means that a program meant to run on one processor cannot run on the other.

We can test this by trying to make use of a container made for the Jetson ARM device on our x86 workstation :

etienne@Workstation:~$ sudo docker run --rm -it nvcr.io/nvidia/deepstream-l4t:6.0-samples /bin/bash
WARNING: The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64) and no specific platform was requested
standard_init_linux.go:228: exec user process caused: exec format error

The nvcr.io/nvidia/deepstream-l4t:6.0 image is provided by NVIDIA on their NGC catalog of AI resources and is meant to run on Jetson devices, containing software compiled for ARM64. As expected, the /bin/bash binary inside the container is an ARM image and therefore cannot be executed on our workstation and hence we get the exec format error message when we try to run it.

In order to run this container, we need to get the x86 to understand ARM machine code. Luckily, with today’s powerful computers,we have the option of using emulator software to do exactly that, in real time, almost completely transparently.

QEMU is an open source, generic, machine emulator and virtualizer. This means that it can take programs compiled for ARM64 and run them on x86. We can also make this process transparent on a Linux system so that when we attempt to run an ARM executable, the system detects it automatically and transparently invokes it using QEMU. We do this through the Linux kernel binfmt_misc mechanism. The binfmt_misc kernel feature allows us to register new file formats with the kernel and tell the system what interpreter to use for each file format. QEMU and binfmt_static can be installed using apt-get:

sudo apt-get install qemu binfmt-support qemu-user-static

Registering the AMD64 binary type can be done easily by leveraging the open source multiarch/qemu-user-static repository on Github:

docker run --rm --privileged multiarch/qemu-user-static --reset -p yes

This container automatically registers foreign file formats with the kernel using binfmt_static to simplify execution of multi-architecture binaries and Docker containers.

In order to test if this has worked, we can try to re-run the Jetson container we tried earlier:

etienne@Workstation:~$ sudo docker run --rm -it nvcr.io/nvidia/deepstream-l4t:6.0-samples /bin/bash
WARNING: The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64) and no specific platform was requested
root@be2f9818f628:/opt/nvidia/deepstream/deepstream-6.0#

This time, while Docker still gives us a warning about the platform mismatch, we successfully run the ARM64 bash executable which gives us a shell into the Jetson container.

We can visualize this setup as in the diagram below:

The tech stack running ARM64 binaries inside docker on x86 hardware

Creating a compilation container

The next step is to use our x86 workstation to compile a Deepstream sample app into an ARM64 binary that can run natively on a Jetson device. We do this by creating a Docker image that contains the necessary compilation tools. Since we want Deepstream to be available in our container, as well as Cuda, we will use the nvcr.io/nvidia/deepstream-l4t:6.0-samples and nvcr.io/nvidia/l4t-cuda:10.2.460-runtimeimages kindly provided by NVIDIA on their NGC catalog of AI resources as our base images. We use a Docker multi-stage build to take Deepstream from one image and transplant it into our custom image based on the other image. The Dockerfile we used is as follows:

# for Deepstream
FROM nvcr.io/nvidia/deepstream-l4t:6.0-samples as deepstream_img
# use as base image for Cuda
FROM nvcr.io/nvidia/l4t-cuda:10.2.460-runtime
# copy deepstream from the first image
COPY --from=deepstream_img /opt/nvidia/deepstream/deepstream-6.0/ /opt/nvidia/deepstream/deepstream-6.0/
# install libraries necessary for compilation
RUN apt-get update
RUN apt-get install -y libgstreamer-plugins-base1.0-dev libgstreamer1.0-dev libgstrtspserver-1.0-dev libx11-dev
# create some soft links to make sure Deepstream samples compilation links correctly to Cuda libraries
RUN ln -s /usr/local/cuda-10.2/targets/aarch64-linux/lib/stubs/libcuda.so /usr/local/cuda-10.2/lib64/libcuda.so
RUN ln -s /usr/local/cuda-10.2/lib64/libcudart.so.10.2 /usr/local/cuda-10.2/lib64/libcudart.so
WORKDIR /opt/nvidia/deepstream/deepstream-6.0
CMD ["/bin/bash"]

We build our custom image…

etienne@Workstation:~$ sudo docker build . -f Dockerfile.build -t ds_cont_arm
[sudo] password for etienne:
Sending build context to Docker daemon 1.853GB
Step 1/9 : FROM nvcr.io/nvidia/deepstream-l4t:6.0-samples as deepstream_img
---> 1e08ebd4f227
Step 2/9 : FROM nvcr.io/nvidia/l4t-cuda:10.2.460-runtime
---> cd9f683f0d3f
Step 3/9 : COPY --from=deepstream_img /opt/nvidia/deepstream/deepstream-6.0/ /opt/nvidia/deepstream/deepstream-6.0/
---> ea6313cae934
Step 4/9 : RUN apt-get update
---> [Warning] The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64) and no specific platform was requested
---> Running in 998c954604b0
...
...
---> Running in d8b8489d4db4
Removing intermediate container d8b8489d4db4
---> 218a23f0e63d
Step 9/9 : CMD ["/bin/bash"]
---> [Warning] The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64) and no specific platform was requested
---> Running in 503280b51707
Removing intermediate container 503280b51707
---> 5cbf5f28b3bd
Successfully built 5cbf5f28b3bd
Successfully tagged ds_cont_arm:latest

… and run it (note that we map a ./data directory into the container as /data. We will use this directory to move our build artifacts out into the host.)

etienne@Workstation:~$ sudo docker run --rm -it -v`pwd`/data:/data ds_cont_arm /bin/bash
WARNING: The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64) and no specific platform was requested
root@e224763ff4e6:/opt/nvidia/deepstream/deepstream-6.0# export NVDS_VERSION=6.0
root@e224763ff4e6:/opt/nvidia/deepstream/deepstream-6.0# export CUDA_VER=10.2
root@e224763ff4e6:/opt/nvidia/deepstream/deepstream-6.0# cd sources/apps/sample_apps/deepstream-test1
root@e224763ff4e6:/opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-test1# make
cc -c -o deepstream_test1_app.o -DPLATFORM_TEGRA -I../../../includes -I /usr/local/cuda-10.2/include -pthread -I/usr/include/gstreamer-1.0 -I/usr/include/glib-2.0 -I/usr/lib/aarch64-linux-gnu/glib-2.0/include deepstream_test1_app.c
cc -o deepstream-test1-app deepstream_test1_app.o -lgstreamer-1.0 -lgobject-2.0 -lglib-2.0 -L/usr/local/cuda-10.2/lib64/ -lcudart -L/opt/nvidia/deepstream/deepstream-6.0/lib/ -lnvdsgst_meta -lnvds_meta -lcuda -Wl,-rpath,/opt/nvidia/deepstream/deepstream-6.0/lib/

We verify that the binary was created correctly and copy it to /data.

root@539bb0ca4956:/opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-test1# ls
Makefile README deepstream-test1-app deepstream_test1_app.c deepstream_test1_app.o dstest1_pgie_config.txt
root@539bb0ca4956:/opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-test1# cp deepstream-test1-app /data
root@539bb0ca4956:/opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-test1#

Now that we have built our binary and collected the build artifacts, we want to package them into an image that we can transfer to the Jetson. We could simply export our custom container and import it back on the Jetson, and in this particular instance, that would be fine, however in a real-world scenario, our runtime container might need to be set up differently and/or more streamlined that our build container.

In order to simulate this, we will create a runtime Docker image with our pre-packaged build artifacts ready to run using the below Dockerfile:

FROM nvcr.io/nvidia/deepstream-l4t:6.0-samples
WORKDIR /opt/nvidia/deepstream/deepstream-6.0
# copy our build artifact into the image
COPY data/* /opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-test1
CMD ["/bin/bash"]

We can now create the image:

etienne@Workstation:~$ sudo docker build . -f Dockerfile.run  -t ds_cont_arm_run
Sending build context to Docker daemon 1.853GB
Step 1/4 : FROM nvcr.io/nvidia/deepstream-l4t:6.0-samples
---> 1e08ebd4f227
Step 2/4 : WORKDIR /opt/nvidia/deepstream/deepstream-6.0
---> [Warning] The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64) and no specific platform was requested
---> Running in 7b6e24be18dc
Removing intermediate container 7b6e24be18dc
---> 9e0f7074129a
Step 3/4 : COPY data/* /opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-test1
---> 4e16811dfbe0
Step 4/4 : CMD ["/bin/bash"]
---> [Warning] The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64) and no specific platform was requested
---> Running in 8bee1b88b7c8
Removing intermediate container 8bee1b88b7c8
---> f8b90c8184e5
Successfully built f8b90c8184e5
Successfully tagged ds_cont_arm_run:latest

In order to transfer the image to the Jetson device, we could either push it to a Docker repository, or else directly transfer the image to the Jetson using docker save and docker load. For the purposes of this tutorial, we choose the latter approach for simplicity:

etienne@Workstation:~$ sudo docker save -o ds_cont_arm_run.img ds_cont_arm_run:latest
etienne@Workstation:~$ ls
data Data Dockerfile2.build Dockerfile.build Dockerfile.run ds_cont_arm_run.img
etienne@Workstation:~$
sudo chown etienne:etienne ds_cont_arm_run.img
etienne@Workstation:~$ scp ds_cont_arm_run.img etienne@nx.jetson:~
ds_cont_arm_run.img 100% 1767MB 51.5MB/s 00:34

And on the Jetson, we load the archived image back into Docker, switch off access control in the windowing system to make sure it accepts connections from the Docker container, run a shell inside the container and start the deepstream-test1 app we compiled on the x86 machine:

etienne@nx_jetson:~$ sudo docker load -i ds_cont_arm_run.img
[sudo] password for etienne:
Loaded image: ds_cont_arm_run:latest
etienne@nx_jetson:~$ xhost +
etienne@nx_jetson:~$ sudo docker run -it --rm --net=host --runtime nvidia -e DISPLAY=$DISPLAY -w /opt/nvidia/deepstream/deepstream-6.0 -v /tmp/.X11-unix/:/tmp/X11-unix ds_cont_arm_run:latest /bin/bash
root@nx_jetson:/opt/nvidia/deepstream/deepstream-6.0# cd sources/apps/sample_apps/deepstream-test1/
root@nx_jetson:/opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-test1# ls
Makefile README deepstream-test1-app deepstream_test1_app.c dstest1_pgie_config.txt
root@nx_jetson:/opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-test1# ./deepstream-test1-app ../../../../samples/streams/sample_720p.h264

…at which point we should be rewarded with our Jetson performing object detection on Nvidia’s sample video…

Of course this is only a toy example, however the approach can easily be extended to a real-life production environment and/or integrated into a CI pipeline.

All of us at SmartCow hope this short tutorial helps with getting you on your way to a faster, more streamlined development pipeline for your Jetson device!

Etienne Bonanno, Sr. Software Engineer-AI at SmartCow

Etienne has over 20 years of experience in software engineering, having worked in a wide range of sectors, including financial systems, mobile telecommunications, and fraud prevention. He has a keen interest in high-performance computing and AI technologies. His hobbies include cycling, fine art, and woodworking.

--

--

SmartCow

SmartCow is an AI engineering company that specializes in advanced video analytics, applied artificial intelligence & electronics manufacturing.