How To Connect Jetson Nano To Kubernetes Using K3s And K3sup
In this article I will show how to connect a Jetson Nano Developer board to the Kubernetes cluster to act as a GPU node. I will cover the setup of NVIDIA docker needed to run containers using GPU and connecting Jetson to the Kubernetes cluster. After successfully connecting node to cluster I will also show how to run simple TensorFlow 2 training session using the GPU on the Jetson Nano.
If you are interested in setting up K3s cluster, you can follow my other tutorial explaining how to build a K3s cluster on Raspberry Pi using Ubuntu Server 18.04. Most of the information provided there are not unique to Raspberry Pi.
K3s or Kubernetes?
K3s is a lightweight version of Kubernetes that is optimized for smaller installations which, in my opinion, is ideal for single board computers as it takes significantly less resources. You can read more about it here. K3sup, on the other hand, is a great open-source tool built by Alex Ellis for simplifying the installation of K3s clusters. You can find more information about it on the GitHub repository.
What do we need?
- A K3s cluster — only a properly configured master node is required
- NVIDIA Jetson Nano Developer board with the Developer Kit installed. For more information how to install the developer kit on the board follow the instruction in the documentation found here.
- K3sup
- 15 minutes
Plan
- Setup NVIDIA docker
- Add Jetson Nano to the K3s cluster
- Run a simple MNIST example to showcase the usage of GPU inside Kubernetes pod
Setting up NVIDIA docker
Before we will configure Docker to use nvidia-docker as a default runtime, I would like to spend a moment on explaining why this is needed. By default, when user will run containers on Jetson Nano they will run in a same way as on any other hardware and you can’t access the GPU from the container, or at least not without some hacking. If you want to test it out by yourself you can run the following command and should see similar results
root@jetson:~# echo "python3 -c 'import tensorflow'" | docker run -i icetekio/jetson-nano-tensorflow /bin/bash
2020-05-14 00:10:23.370761: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.2'; dlerror: libcudart.so.10.2: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/targets/aarch64-linux/lib:
2020-05-14 00:10:23.370859: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-05-14 00:10:25.946896: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/targets/aarch64-linux/lib:
2020-05-14 00:10:25.947219: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/targets/aarch64-linux/lib:
2020-05-14 00:10:25.947273: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
/usr/lib/python3/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
If you now try to run the same command but adding --runtime=nvidia
parameter to the docker command you should see something like this
root@jetson:~# echo "python3 -c 'import tensorflow'" | docker run --runtime=nvidia -i icetekio/jetson-nano-tensorflow /bin/bash
2020-05-14 00:12:16.767624: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-05-14 00:12:19.386354: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7
2020-05-14 00:12:19.388700: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7
/usr/lib/python3/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
The nvidia-docker is configured, however not enabled by default. To enable docker to run nvidia-docker runtime as a default — add the "default-runtime": "nvidia"
to the /etc/docker/daemon.json
config file so it will look like this
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}
Now you can skip the --runtime=nvidia
argument in the docker run command, and the GPU will be initialized by default. This is needed so that K3s will use Docker with the nvidia-docker runtime to allow the pods to use GPU without any hassle and special configuration.
Connecting Jetson as a Kubernetes node
Connecting Jetson as a Kubernetes node using K3sup is only 1 command, however for it to work we need to be able to connect to both Jetson and the master node without password and do sudo
without password, or to connect as root user.
If you need to generate SSH keys and copy them over you can run something like this
ssh-keygen -t rsa -b 4096 -f ~/.ssh/rpi -P ""
ssh-copy-id -i .ssh/rpi user@host
By default, Ubuntu installations require users to put in password for sudo
command. Because of that, the easier way is to user K3sup with root account. To make this work copy your ~/.ssh/authorized_keys
to /root/.ssh/
directory.
Before connecting Jetson, lets look at the cluster we want to connect it to
upgrade@ZeroOne:~$ kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
nexus Ready master 32d v1.17.2+k3s1 192.168.0.12 <none> Ubuntu 18.04.4 LTS 4.15.0-96-generic containerd://1.3.3-k3s1
rpi3-32 Ready <none> 32d v1.17.2+k3s1 192.168.0.30 <none> Ubuntu 18.04.4 LTS 5.3.0-1022-raspi2 containerd://1.3.3-k3s1
rpi3-64 Ready <none> 32d v1.17.2+k3s1 192.168.0.32 <none> Ubuntu 18.04.4 LTS 5.3.0-1022-raspi2 containerd://1.3.3-k3s1
As you may notice, master node is a nexus
host on IP 192.168.0.12
that is running containerd. By default, k3s is running containerd but that can be modified. The containerd is a bit problematic as we did set up the nvidia-docker to run with Docker and it is needed for the GPU. Fortunately, to switch from containerd to Docker we just need to pass one additional parameter to the k3sup command. So, finally, to connect our Jetson to the cluster we can run:
k3sup join --ssh-key ~/.ssh/rpi --server-ip 192.168.0.12 --ip 192.168.0.40 --k3s-extra-args '--docker'
The IP 192.168.0.40
is my Jetson Nano. As you can see we passed the --k3s-extra-args '--docker'
flag that passes the --docker
flag to k3s agent while installing it. Thanks to that, we are using the docker with nvidia-docker setup rather than containerd.
To check if the node connected correctly we can run kubectl get node -o wide
upgrade@ZeroOne:~$ kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
nexus Ready master 32d v1.17.2+k3s1 192.168.0.12 <none> Ubuntu 18.04.4 LTS 4.15.0-96-generic containerd://1.3.3-k3s1
rpi3-32 Ready <none> 32d v1.17.2+k3s1 192.168.0.30 <none> Ubuntu 18.04.4 LTS 5.3.0-1022-raspi2 containerd://1.3.3-k3s1
rpi3-64 Ready <none> 32d v1.17.2+k3s1 192.168.0.32 <none> Ubuntu 18.04.4 LTS 5.3.0-1022-raspi2 containerd://1.3.3-k3s1
jetson Ready <none> 11s v1.17.2+k3s1 192.168.0.40 <none> Ubuntu 18.04.4 LTS 4.9.140-tegra docker://19.3.6
Simple validation
We can now run pod using the same docker image and command to check if we will have the same results as running docker on Jetson Nano at the beginning of this article.
To do this, we can apply this pod spec:
apiVersion: v1
kind: Pod
metadata:
name: gpu-test
spec:
nodeSelector:
kubernetes.io/hostname: jetson
containers:
- image: icetekio/jetson-nano-tensorflow
name: gpu-test
command:
- "/bin/bash"
- "-c"
- "echo 'import tensorflow' | python3"
restartPolicy: Never
Wait for the docker image to pull and then view the logs by running:
upgrade@ZeroOne:~$ kubectl logs gpu-test
2020-05-14 10:01:51.341661: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-05-14 10:01:53.996300: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7
2020-05-14 10:01:53.998563: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7
/usr/lib/python3/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
As you can see, we have similar log messages as previously running docker on the Jetson!
Running MNIST training
We have a running node with GPU support, so now we can test out the “Hello world” of Machine Learning and run the TensorFlow 2 model example using MNIST dataset.
To run a simple training session that will prove the usage of GPU apply the manifest below.
apiVersion: v1
kind: Pod
metadata:
name: mnist-training
spec:
nodeSelector:
kubernetes.io/hostname: jetson
initContainers:
- name: git-clone
image: iceci/utils
command:
- "git"
- "clone"
- "<https://github.com/IceCI/example-mnist-training.git>"
- "/workspace"
volumeMounts:
- mountPath: /workspace
name: workspace
containers:
- image: icetekio/jetson-nano-tensorflow
name: mnist
command:
- "python3"
- "/workspace/mnist.py"
volumeMounts:
- mountPath: /workspace
name: workspace
restartPolicy: Never
volumes:
- name: workspace
emptyDir: {}
As you can see in the log below, the GPU is running.
...
2020-05-14 11:30:02.846289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-05-14 11:30:02.846434: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
....
If you are on the node, you can test the usage of CPU and GPU by running tegrastats
command
upgrade@jetson:~$ tegrastats --interval 5000
RAM 2462/3964MB (lfb 2x4MB) SWAP 362/1982MB (cached 6MB) CPU [52%@1479,41%@1479,43%@1479,34%@1479] EMC_FREQ 0% GR3D_FREQ 9% PLL@23.5C CPU@26C PMIC@100C GPU@24C AO@28.5C thermal@25C POM_5V_IN 3410/3410 POM_5V_GPU 451/451 POM_5V_CPU 1355/1355
RAM 2462/3964MB (lfb 2x4MB) SWAP 362/1982MB (cached 6MB) CPU [53%@1479,42%@1479,45%@1479,35%@1479] EMC_FREQ 0% GR3D_FREQ 9% PLL@23.5C CPU@26C PMIC@100C GPU@24C AO@28.5C thermal@24.75C POM_5V_IN 3410/3410 POM_5V_GPU 451/451 POM_5V_CPU 1353/1354
RAM 2461/3964MB (lfb 2x4MB) SWAP 362/1982MB (cached 6MB) CPU [52%@1479,38%@1479,43%@1479,33%@1479] EMC_FREQ 0% GR3D_FREQ 10% PLL@24C CPU@26C PMIC@100C GPU@24C AO@29C thermal@25.25C POM_5V_IN 3410/3410 POM_5V_GPU 493/465 POM_5V_CPU 1314/1340
Summary
As you can see, hooking up a Jetson Nano to a Kubernetes cluster is a pretty simple and straightforward process. In just a couple of minutes, you’ll be able to leverage Kubernetes to run machine learning workloads — using the power of NVIDIA’s pocket-sized GPU as well. You’ll be able to run any GPU containers designed for Jetson Nano on Kubernetes, which can simplify your development and testing.
Readout
- Jetson Nano Developer Kit documentation: https://developer.nvidia.com/embedded/learn/get-started-jetson-nano-devkit#intro
- NVIDIA docker repository including the overview of NVIDIA docker: https://github.com/NVIDIA/nvidia-docker
- K3s website: https://k3s.io/
- K3sup: https://github.com/alexellis/k3sup
- MNIST model code used from TensorFlow 2 documentation: https://www.tensorflow.org/overview