How To Connect Jetson Nano To Kubernetes Using K3s And K3sup

Jakub Czapliński
Icetek
Published in
7 min readMay 14, 2020
Icetek Team working hard on connecting Jetson Nano to the cluster.

In this article I will show how to connect a Jetson Nano Developer board to the Kubernetes cluster to act as a GPU node. I will cover the setup of NVIDIA docker needed to run containers using GPU and connecting Jetson to the Kubernetes cluster. After successfully connecting node to cluster I will also show how to run simple TensorFlow 2 training session using the GPU on the Jetson Nano.

If you are interested in setting up K3s cluster, you can follow my other tutorial explaining how to build a K3s cluster on Raspberry Pi using Ubuntu Server 18.04. Most of the information provided there are not unique to Raspberry Pi.

K3s or Kubernetes?

K3s is a lightweight version of Kubernetes that is optimized for smaller installations which, in my opinion, is ideal for single board computers as it takes significantly less resources. You can read more about it here. K3sup, on the other hand, is a great open-source tool built by Alex Ellis for simplifying the installation of K3s clusters. You can find more information about it on the GitHub repository.

What do we need?

  • A K3s cluster — only a properly configured master node is required
  • NVIDIA Jetson Nano Developer board with the Developer Kit installed. For more information how to install the developer kit on the board follow the instruction in the documentation found here.
  • K3sup
  • 15 minutes

Plan

  • Setup NVIDIA docker
  • Add Jetson Nano to the K3s cluster
  • Run a simple MNIST example to showcase the usage of GPU inside Kubernetes pod

Setting up NVIDIA docker

Before we will configure Docker to use nvidia-docker as a default runtime, I would like to spend a moment on explaining why this is needed. By default, when user will run containers on Jetson Nano they will run in a same way as on any other hardware and you can’t access the GPU from the container, or at least not without some hacking. If you want to test it out by yourself you can run the following command and should see similar results

root@jetson:~# echo "python3 -c 'import tensorflow'" | docker run -i icetekio/jetson-nano-tensorflow /bin/bash
2020-05-14 00:10:23.370761: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.2'; dlerror: libcudart.so.10.2: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/targets/aarch64-linux/lib:
2020-05-14 00:10:23.370859: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-05-14 00:10:25.946896: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/targets/aarch64-linux/lib:
2020-05-14 00:10:25.947219: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/targets/aarch64-linux/lib:
2020-05-14 00:10:25.947273: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
/usr/lib/python3/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters

If you now try to run the same command but adding --runtime=nvidia parameter to the docker command you should see something like this

root@jetson:~# echo "python3 -c 'import tensorflow'" | docker run --runtime=nvidia -i icetekio/jetson-nano-tensorflow /bin/bash
2020-05-14 00:12:16.767624: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-05-14 00:12:19.386354: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7
2020-05-14 00:12:19.388700: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7
/usr/lib/python3/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters

The nvidia-docker is configured, however not enabled by default. To enable docker to run nvidia-docker runtime as a default — add the "default-runtime": "nvidia" to the /etc/docker/daemon.json config file so it will look like this

{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}

Now you can skip the --runtime=nvidia argument in the docker run command, and the GPU will be initialized by default. This is needed so that K3s will use Docker with the nvidia-docker runtime to allow the pods to use GPU without any hassle and special configuration.

Connecting Jetson as a Kubernetes node

Connecting Jetson as a Kubernetes node using K3sup is only 1 command, however for it to work we need to be able to connect to both Jetson and the master node without password and do sudo without password, or to connect as root user.

If you need to generate SSH keys and copy them over you can run something like this

ssh-keygen -t rsa -b 4096 -f ~/.ssh/rpi -P ""
ssh-copy-id -i .ssh/rpi user@host

By default, Ubuntu installations require users to put in password for sudo command. Because of that, the easier way is to user K3sup with root account. To make this work copy your ~/.ssh/authorized_keys to /root/.ssh/ directory.

Before connecting Jetson, lets look at the cluster we want to connect it to

upgrade@ZeroOne:~$ kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
nexus Ready master 32d v1.17.2+k3s1 192.168.0.12 <none> Ubuntu 18.04.4 LTS 4.15.0-96-generic containerd://1.3.3-k3s1
rpi3-32 Ready <none> 32d v1.17.2+k3s1 192.168.0.30 <none> Ubuntu 18.04.4 LTS 5.3.0-1022-raspi2 containerd://1.3.3-k3s1
rpi3-64 Ready <none> 32d v1.17.2+k3s1 192.168.0.32 <none> Ubuntu 18.04.4 LTS 5.3.0-1022-raspi2 containerd://1.3.3-k3s1

As you may notice, master node is a nexus host on IP 192.168.0.12 that is running containerd. By default, k3s is running containerd but that can be modified. The containerd is a bit problematic as we did set up the nvidia-docker to run with Docker and it is needed for the GPU. Fortunately, to switch from containerd to Docker we just need to pass one additional parameter to the k3sup command. So, finally, to connect our Jetson to the cluster we can run:

k3sup join --ssh-key ~/.ssh/rpi  --server-ip 192.168.0.12  --ip 192.168.0.40   --k3s-extra-args '--docker'

The IP 192.168.0.40 is my Jetson Nano. As you can see we passed the --k3s-extra-args '--docker' flag that passes the --docker flag to k3s agent while installing it. Thanks to that, we are using the docker with nvidia-docker setup rather than containerd.

To check if the node connected correctly we can run kubectl get node -o wide

upgrade@ZeroOne:~$ kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
nexus Ready master 32d v1.17.2+k3s1 192.168.0.12 <none> Ubuntu 18.04.4 LTS 4.15.0-96-generic containerd://1.3.3-k3s1
rpi3-32 Ready <none> 32d v1.17.2+k3s1 192.168.0.30 <none> Ubuntu 18.04.4 LTS 5.3.0-1022-raspi2 containerd://1.3.3-k3s1
rpi3-64 Ready <none> 32d v1.17.2+k3s1 192.168.0.32 <none> Ubuntu 18.04.4 LTS 5.3.0-1022-raspi2 containerd://1.3.3-k3s1
jetson Ready <none> 11s v1.17.2+k3s1 192.168.0.40 <none> Ubuntu 18.04.4 LTS 4.9.140-tegra docker://19.3.6

Simple validation

We can now run pod using the same docker image and command to check if we will have the same results as running docker on Jetson Nano at the beginning of this article.

To do this, we can apply this pod spec:

apiVersion: v1
kind: Pod
metadata:
name: gpu-test
spec:
nodeSelector:
kubernetes.io/hostname: jetson
containers:
- image: icetekio/jetson-nano-tensorflow
name: gpu-test
command:
- "/bin/bash"
- "-c"
- "echo 'import tensorflow' | python3"
restartPolicy: Never

Wait for the docker image to pull and then view the logs by running:

upgrade@ZeroOne:~$ kubectl logs gpu-test 
2020-05-14 10:01:51.341661: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-05-14 10:01:53.996300: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7
2020-05-14 10:01:53.998563: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7
/usr/lib/python3/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters

As you can see, we have similar log messages as previously running docker on the Jetson!

Running MNIST training

We have a running node with GPU support, so now we can test out the “Hello world” of Machine Learning and run the TensorFlow 2 model example using MNIST dataset.

To run a simple training session that will prove the usage of GPU apply the manifest below.

apiVersion: v1
kind: Pod
metadata:
name: mnist-training
spec:
nodeSelector:
kubernetes.io/hostname: jetson
initContainers:
- name: git-clone
image: iceci/utils
command:
- "git"
- "clone"
- "<https://github.com/IceCI/example-mnist-training.git>"
- "/workspace"
volumeMounts:
- mountPath: /workspace
name: workspace
containers:
- image: icetekio/jetson-nano-tensorflow
name: mnist
command:
- "python3"
- "/workspace/mnist.py"
volumeMounts:
- mountPath: /workspace
name: workspace
restartPolicy: Never
volumes:
- name: workspace
emptyDir: {}

As you can see in the log below, the GPU is running.

...
2020-05-14 11:30:02.846289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-05-14 11:30:02.846434: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
....

If you are on the node, you can test the usage of CPU and GPU by running tegrastats command

upgrade@jetson:~$ tegrastats --interval 5000
RAM 2462/3964MB (lfb 2x4MB) SWAP 362/1982MB (cached 6MB) CPU [52%@1479,41%@1479,43%@1479,34%@1479] EMC_FREQ 0% GR3D_FREQ 9% PLL@23.5C CPU@26C PMIC@100C GPU@24C AO@28.5C thermal@25C POM_5V_IN 3410/3410 POM_5V_GPU 451/451 POM_5V_CPU 1355/1355
RAM 2462/3964MB (lfb 2x4MB) SWAP 362/1982MB (cached 6MB) CPU [53%@1479,42%@1479,45%@1479,35%@1479] EMC_FREQ 0% GR3D_FREQ 9% PLL@23.5C CPU@26C PMIC@100C GPU@24C AO@28.5C thermal@24.75C POM_5V_IN 3410/3410 POM_5V_GPU 451/451 POM_5V_CPU 1353/1354
RAM 2461/3964MB (lfb 2x4MB) SWAP 362/1982MB (cached 6MB) CPU [52%@1479,38%@1479,43%@1479,33%@1479] EMC_FREQ 0% GR3D_FREQ 10% PLL@24C CPU@26C PMIC@100C GPU@24C AO@29C thermal@25.25C POM_5V_IN 3410/3410 POM_5V_GPU 493/465 POM_5V_CPU 1314/1340

Summary

As you can see, hooking up a Jetson Nano to a Kubernetes cluster is a pretty simple and straightforward process. In just a couple of minutes, you’ll be able to leverage Kubernetes to run machine learning workloads — using the power of NVIDIA’s pocket-sized GPU as well. You’ll be able to run any GPU containers designed for Jetson Nano on Kubernetes, which can simplify your development and testing.

Readout

--

--