Singularity Containers, TensorFlow and the NVIDIA Jetson Nano: an experiment

Published in

SingularityApp

7 min readNov 19, 2019

(This is a technical article that presents a specific aspect of using Singularity in the context of high performance computing. As a result, the goal is not to have a quick read but instead present an experience, a story that I think can be valuable to others. I tried to organize the document to let readers skip sections that are not be of interest to them.)

Disclaimer: This article uses the NVIDIA software stack. At the time the work was done and this article written, everything presented here is, to the best of our knowledge, compliant with the NVIDIA user agreement. Please ensure that if you are trying to duplicate or do similar work that the user agreement is still compliant with what is presented in this article. If not, please respect the user agreement and do not use information presented in this article.

Why Using Singularity Containers to Run TensorFlow on the NVIDIA Jetson Nano?

The Nvidia Jetson Nano is a fairly new type of devices: designed to be at the edge and an autonomous device while still offering a good GPU for high performance computing at the edge. So in my case, it means a lot of testing, experiments, and applications running side-by-side. Based on this, putting TensorFlow in its own container is extremely useful: it can have its own Python environment without interfering with the environment required by other applications, regardless of the AI model you are trying to run (I am referring to the nightmare situation where your IA model requires a specific version of TensorFlow, which itself requires a specific version of Python, which itself is not available via your Linux distribution and all of that running on an architecture not supported by solutions such as Conda for Python).

Once you are convinced that running your TensorFlow in a container is a good way to go, two options are possible: use the excellent NGC service from NVIDIA (https://ngc.nvidia.com/) or if you really want to have a good handle over your containers, dig into the dark technical details of the NVIDIA software stack, or if you like doing things from scratch, create your own Singularity container from scratch. This article is about the later case.

What are the Requirements Before Starting?

You have to make sure that what you know which version of TensorFlow you need. This question is not related to using containers but will depend on the target platform, here, the Jetson Nano. Practically this means that you need to know which TensorFlow wheel has to be used, even if you were to use the system without containers.

In this article, we will use the following requirements:

TensorFlow-gpu 1.13.1, version that is prepared to run on NVIDIA GPUs.
HDF5 to deal with input data
TensorFlow-estimator 1.13.0 and Tensorboard 1.13.0

What Are the Difficulties Related to the NVIDIA Jetson Nano?

The main problem, in our specific context, it is understand the NVIDIA software stack used on the nano and make sure that it is correctly setup in the container in addition of TensorFlow. This can be especially challenging when we do not rely on prebuilt NVIDIA images and try to create a container from scratch. Fortunately, this article will present all the necessary steps. Also remember that the NVIDIA stack is based on a very specific user agreement. This does not prevent you from creating your containers from scratch but make sure you do not distribute/share your images in a way that would not be compliant with the user agreement.

Building your Definition File

The bootstrap section:

Bootstrap: dockerFrom: arm64v8/ubuntu:bionic

This is pretty self-explanatory, i.e., we bootstrap an image from docker for the ARM64v8 architecture (the architecture of the Jetson Nano) based on Ubuntu Bionic. This ensures that we can copy packages, binaries and libraries from the Jetson Nano to the image without incompatibility problems.

The files section:

%files 
     # Copy all the debian packages from nvidia that are by default on the nano 
     /var/cuda-repo-10-0-local-10.0.166/ /debs
     # Copy some local nvidia libraries that are not easy to get from the internet/packages 
     /usr/lib/aarch64-linux-gnu/libcudnn.so.7.3.1 /usr/lib/aarch64-linux-gnu/libcudnn.so.7.3.1 
     /usr/lib/aarch64-linux-gnu/libcudnn_static_v7.a /usr/lib/aarch64-linux-gnu/libcudnn_static_v7.a

This section is critical, it copies all the required Debian packages and libraries from the host to the container.

The post section:

We will actually decompose the post section since it contains quite a few steps…

%post
    # Create config file(s) required for the install
    echo "TEGRA_OTA_BOOT_DEVICE /dev/mmcblk0boot0" > /etc/nv_boot_control.conf
    echo "TEGRA_OTA_GPT_DEVICE /dev/mmcblk0boot1" >> /etc/nv_boot_control.conf
    echo "TEGRA_CHIPID 0x21" >> /etc/nv_boot_control.conf
    echo "TNSPEC p2371-2180-devkit.default" >> /etc/nv_boot_control.conf

This first sub-section of post creates the appropriate configuration file that is required by the NVIDIA software stack. Note that the content that is added into that configuration file is specific to the hardware available on the Jetson Nano. It is therefore possible that the values may change over time as NVIDIA update the Jetson Nano. In case of problem, simply check the content of the /etc/nv_boot_control.conf file on your host system.

    export DEBIAN_FRONTEND=noninteractive
    apt-get update
    apt-get -y install apt-utils # Install first to avoid warnings 
    apt-get -y install gnupg dpkg-dev wget tar    # Get the nvidia drivers which are hidden in the kit used to
    # image the nano system
    mkdir /debs2
    mkdir /download
    cd /download; wget https://developer.nvidia.com/embedded/dlc/Jetson-210_Linux_R32.2.0-0 mv /download/Jetson-210_Linux_R32.2.0-0 /download/Jetson-210_Linux_R32.2.0_aarch64.tbz2
    cd /download; tar xjf Jetson-210_Linux_R32.2.0_aarch64.tbz2
    cp -rf /download/Linux_for_Tegra/nv_tegra/l4t_deb_packages/*.deb /debs2    # Make sure we can use all the nvidia debs with apt
    cd /debs2; dpkg-scanpackages . /dev/null | gzip -9c > Packages.gz
    echo "deb file:/debs ./" >> /etc/apt/sources.list
    echo "deb [trusted=yes] file:/debs2 ./" >> /etc/apt/sources.list
    apt-key add /debs/7fa2af80.pub
    apt-get update     # Install the nvidia drivers and nvidia specific packages
    apt-get install -y nvidia-l4t-cuda cuda-nvtx-10-0 cuda-libraries-dev-10-0 nvidia-l4t-core
    # Mimic all the commands the nvidia install scripts does
    ln -s /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1.1 /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
    ln -s /usr/lib/aarch64-linux-gnu/libcudnn.so.7.3.1 /usr/lib/aarch64-linux-gnu/libcudnn.so

This prepares the image to both be able to install Debian packages that we earlier copied into the image. It creates a repository directly into the image and setup apt to be able to use the local repository. Once done, Debian packages are installed. Finally, we manually create a few symbolic links that the NVIDIA stack usually creates when imaging the Jetson Nano for the first time.

    # Install the app level packages
    # Rely mainly on pip to install Python packages to avoid 
    # conflicts that appear when using debs.
    apt-get -y  install \
        libwebcam0-dev libwebcam0 libv4l-dev \
        libgtk3-nocsd0 gtk3-nocsd python3.6 python3.6-dev \
        libpython3.6-dev python3-pip python3-opencv \
        python3-matplotlib libhdf5-dev    cd /src
    pip3 install -U setuptools
    CFLAGS="-I/usr/include/hdf5/serial -L/usr/lib/aarch64-linux-gnu/hdf5/serial/" CPPFLAGS="-I/usr/include/hdf5/serial -L/usr/lib/aarch64-linux-gnu/hdf5/serial/" DF5_DIR=/usr/lib/aarch64-linux-gnu/hdf5/serial/ pip3 install -U numpy grpcio absl-py py-cpuinfo psutil portpicker six mock requests gast h5py astor termcolor protobuf keras-applications keras-preprocessing wrapt google-pasta
    pip3 install tensorflow-estimator==1.13.0
    pip3 install tensorboard==1.13.0 CFLAGS="-I/usr/include/hdf5/serial" CPPFLAGS="-I/usr/include/hdf5/serial" DF5_DIR=/usr/lib/aarch64-linux-gnu/hdf5/serial/ pip3 install --pre --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v42 tensorflow-gpu==1.13.1
    <command to setup your tensorflow application>    rm -rf /src
    rm -rf /debs
    rm -rf /debs2
    rm -rf /download

This sub-section install TensorFlow and some dependencies using pip. At first, we install a few Debian packages required by the application. Some of these packages are there for illustration (libwebcam0-dev, libwebcam0, and libv4l-dev can be required if you try to do object detection using a webcam connected to your Jetson Nano); while python and HDF5 packages are required to install TensorFlow with HDF5 support. Then, tensorflow-estimator, tensorboard and tensorflow-gpu from NVIDIA are installed. At this point, it is possible for you to install your tensorflow package. Finally, we do some cleanup to minimize the size of the image.

The runscript and startscript sections:

%runscript
    /bin/bash -c 'LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu/tegra stdbuf -oL -eL <yourapp.exe>'%startscript
    /bin/bash -c 'LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu/tegra stdbuf -oL -eL detect_objects'

These two sections will start your application when you will run your container.

Conclusion

This example shows that it is very possible to build your TensorFlow containers for the NVIDIA Jetson Nano from scratch. Please, if you consider doing something similar, have a look at the user agreement and check that at the time you are trying to do something similar, it is actually compliant with the agreement.

The main problem is really to figure out how the system stack in the container must be setup. Once done, it is a fairly straight forward process. In fact, I personally find the end result as a very interesting alternative to solutions such as Conda for Python (with Singularity, you can for instance encrypt your container).

Ultimately, by the time your container is ready, you know for sure that you can run your TensorFlow model at any time, even if NVIDIA updates the software stack of the Jetson Nano. It also will let you ensure that you can keep the system on the Jetson Nano clean and be able to quickly switch from one application to another. All that without heavy weight daemons running on the host and having to run the container as root.

In conclusion, I personally believe the Jetson Nano is a neat ARM64v8 platform with a lot of potential so why not use it for as many workloads as possible!

Singularity Containers, TensorFlow and the NVIDIA Jetson Nano: an experiment

Written by Geoffroy Vallee