ML TIPS & TRICKS / DALI

How to install NVIDIA DALI TRITON backend on Jetson devices

Let’s improve NVIDIA’s documentation step by step

Ivan Ralašić

Published in

forsight.ai

7 min readFeb 24, 2023

If you’re reading this, you for sure know that installing NVIDIA stuff can be a daunting task for even the most experienced devs.

While the NVIDIA Jetson devices offer incredible performance for deep learning applications, getting started with using them for your projects can be challenging due to limited documentation and a lack of end-to-end tutorials.

In this blog, we’ll focus on getting NVIDIA DALI and NVIDIA TRITON server working together on Jetson AGX devices.

NVIDIA Deep Learning (DALI) is a powerful tool for accelerating image and video preprocessing on GPUs, providing a significant performance boost for deep learning workflows.

NVIDIA TRITON server, on the other hand, is a popular solution for deploying, managing and scaling machine learning models. However, despite their complementary capabilities, getting the two to work together can be a challenge.

NVIDIA Triton Inference Server and NVIDIA DALI Logos (source, source)

One particular issue that deep learning practitioners may face is integrating DALI backend support into their TRITON server on NVIDIA Jetson AGX devices. Jetson devices offer powerful GPU acceleration, making them ideal for deep learning applications. However, adding DALI backend support to TRITON on Jetson AGX devices can be tricky due to the lack of comprehensive documentation and end-to-end tutorials.

The benefits of using DALI and TRITON together are significant. DALI accelerates image and video preprocessing on GPUs, freeing up valuable CPU resources for other tasks. TRITON, on the other hand, provides a simple and efficient way to deploy, manage, and scale deep learning models. By combining the two, deep learning practitioners can take advantage of the unique capabilities of Jetson AGX devices and push the boundaries of what’s possible in deep learning.

So, let’s start!

Cross-compiling NVIDIA DALI for aarch64 Jetson Linux using Docker

Cross-compiling NVIDIA DALI for aarch64 Jetson Linux using Docker is the first step in the process of getting the DALI backend to work on AGX. By cross-compiling DALI for aarch64, we can create the necessary library files that will allow the backend to be built for use on AGX. This process requires the use of Docker and some additional tools, but it provides an efficient way to compile code for a different architecture. Once we have the necessary library files, we can move on to building the backend itself and integrating it with the TRITON server.

Clone the DALI repository from GitHub by running the following command in the terminal:

 git clone https://github.com/NVIDIA/DALI.git

Note: You can checkout the exact branch that you need from the DALI repository on GitHub using the command git clone -b <branch_name> https://github.com/NVIDIA/DALI.git. For example, to checkout the release_v1.23 branch, you can use git clone -b release_v1.23 https://github.com/NVIDIA/DALI.git.

2. Navigate to the root directory of the cloned DALI repository.

3. Build the aarch64 Jetson Linux Build Container by running the following command in the terminal:

sudo docker build -t nvidia/dali:builder_aarch64-linux -f docker/Dockerfile.build.aarch64-linux .

4. Compile the DALI source code by running the following command in the terminal:

sudo docker run -v $(pwd):/dali nvidia/dali:builder_aarch64-linux

5. Wait for the compilation process to complete. This may take some time (be patient!).

6. Once the compilation process is complete, the relevant Python wheel will be located in the dali_root_dir/wheelhouse directory.

* If you encounter any issues with linter check or have previously built DALI, run the following commands to remove the relevant directories before attempting to build again:

sudo rm -rf dali_tf_plugin/dali_tf_sdist_build 
sudo rm -rf build_aarch64_linux

That’s it! You have successfully cross-compiled DALI for aarch64 Jetson Linux on a host device.

Next, we have to move the built binaries to the AGX device. Copy the nvidia_dali_cuda110–1.24.0.dev0–0-py3-none-manylinux2014_aarch64.whl and nvidia-dali-tf-plugin-cuda110-1.24.0.dev0.tar.gz file to the AGX device using the following command:

scp -r DALI/wheelhouse/nvidia-dali-tf-plugin-cuda110-1.24.0.dev0.tar.gz <username>@<AGX device IP address>:/DALI
scp -r DALI/wheelhouse/nvidia_dali_cuda110-1.24.0.dev0-0-py3-none-manylinux2014_aarch64.whl <username>@<AGX device IP address>:/DALI

Building DALI Backend using fresh DALI release on bare metal Jetson AGX

The second step of the process involves building the DALI Backend using a fresh DALI release on a bare metal Jetson AGX device. To achieve this, we first clone the dali_backend repository and ensure that the necessary prerequisites, such as CMake 3.17+ and Triton Server, are installed on the device. We then clone the dali_backend repository and all its submodules, create a build directory and run the cmake .. and make commands to build the DALI Backend.

Clone the dali_backend repository by running the following command:

git clone --recursive https://github.com/triton-inference-server/dali_backend.git

2. Navigate to the dali_backend directory:

cd dali_backend

3. Install the wheel file that we cross-compiled on the host system and transferred to the AGX using:

pip3 install nvidia_dali_cuda110-1.24.0.dev0-0-py3-none-manylinux2014_aarch64.whl

4. When building the DALI backend, it is important to make sure that the TRITON_SKIP_DALI_DOWNLOAD flag is set to ON in the CMakeLists.txt file. This flag tells CMake not to download and build DALI from the source, as we have already built it separately on the host device and copied the files over to the AGX. Additionally, it is important to make sure that the TRITON_BACKEND_API_VERSION specified in the CMakeLists.txt the file corresponds to the version of the Triton Inference Server you are using. This ensures compatibility between the backend and server. For example, if you are using Triton Inference Server version 22.10, you would set TRITON_BACKEND_API_VERSION to "r22.10". If you are using a different version, you would set it accordingly.

5. Build the dali_backend using the following instructions.

mkdir build
cd build
cmake .
make

In the tritonserver/backends/ directory, you can find several preinstalled backends such as identity, onnxruntime, python, pytorch, tensorflow1, tensorflow2, and tensorrt. However, it is worth noting that DALI is not present in this directory.

To copy the built dali_backend files to this directory, you can use the following command:

sudo cp -r /usr/local/backends/dali/ /tritonserver/backends/

It is crucial to copy the DALI libraries into the backends directory so that the TRITON server can load them. Without this step, the TRITON server won’t start. The command to copy the DALI libraries into the backends directory is:

sudo cp -r ~/.local/lib/python3.8/site-packages/nvidia/dali/ /tritonserver/backends/dali/

Running TRITON server with DALI backend support

After successfully cross-compiling NVIDIA DALI and building the DALI backend on Jetson AGX, the next step is to run the TRITON server with DALI backend support. This involves configuring the TRITON server to load the DALI backend and using it to perform inference on models that require DALI preprocessing.

The tritonserver binaries can be found in the tritonserver/bin/ directory. To start the server, you need to run the tritonserver executable and specify the paths to the backend directory and model repository.

Here is an example command with explanations of the parameters:

tritonserver/bin/tritonserver --backend-directory "PATH_TO_BACKENDS" \ 
--model-repository "PATH_TO_MODEL_REPO"

--backend-directory "PATH_TO_BACKENDS": specifies the path to the directory containing the backend shared libraries. In our case, this should be set to tritonserver/backends/.
--model-repository "PATH_TO_MODEL_REPO": specifies the path to the directory containing the model configuration files and model artifacts. This directory should be organized according to the Triton Server's model repository hierarchy.

If you followed the instructions closely, part of the TRITON server log should write:

I0224 09:55:32.394467 85914 server.cc:590] 
+----------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+
| Backend  | Path                                                                            | Config                                                                                                          |
+----------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+
| tensorrt | /tritonserver/backends/tensorrt/libtriton_tensorrt.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"5.300000","backend-directory":" |
|          |                                                                                 | /tritonserver/backends/","default-max-batch-size":"4"}}                                           |
| dali     | /tritonserver/backends/dali/libtriton_dali.so         | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"5.300000","backend-directory":" |
|          |                                                                                 | /tritonserver/backends/","default-max-batch-size":"4"}}                                           |
| python   | /tritonserver/backends/python/libtriton_python.so     | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"5.300000","backend-directory":" |
|          |                                                                                 | /tritonserver/backends/","default-max-batch-size":"4"}}                                           |
+----------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+

Once the server is running, you can use a client to query it for inference requests.

In this blog post, we went through the process of building the DALI backend for the NVIDIA Triton Inference Server on an NVIDIA Jetson AGX Xavier device. We started by installing the necessary dependencies and building the DALI library from source. Then, we cloned the dali_backend repository and built the backend using CMake.

Next, we copied the built DALI backend files to the Triton Inference Server’s backend directory and made sure that the server could find them. Finally, we tested our setup by running an example model using the Triton Inference Server.

By following these steps, we were able to successfully build and install the DALI backend on the Jetson AGX Xavier device, enabling us to accelerate image data preprocessing for machine learning applications. We hope that this guide will help other users overcome the pain points of building and installing the DALI backend for the Triton Inference Server on the Jetson AGX Xavier device.

We hope that you found this blog post useful, please take a look at some other blogs written by our team at Forsight, and feel free to reach out to us at info@forsight.ai if you have any questions!

References

GitHub - triton-inference-server/dali_backend: The Triton backend that allows running…

NOTE: dali_backend is available in tritonserver-20.11 and later dali_backend is new and rapidly growing. Official…

github.com

GitHub - NVIDIA/DALI: A GPU-accelerated library containing highly optimized building blocks and an…

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to…