Running Kubernetes on GPU Nodes
Jetson Nano is a small, powerful computer designed to power entry-level edge AI applications and devices. It has a GPU core, which can be utilized for resource-intensive processes such as running ML models, video streaming, etc. Getting a GPU core-enabled machine in the cloud is a bit more expensive than traditional systems, so I thought I can tap on my Jetson nano and link it with the Cloud machines.
Recently I created two nodes Kubernetes cluster using K3s and Nvidia Jetson Nano. So in this short post, I will describe the steps used to create K3s Cluster with Control Plane on the Cloud and Nvidia Jetson as worker node which sits in my home. So let’s get started.
- Install K3s on the VM which acts as the control plane. By default, it uses Containerd as the container runtime. If you wanted to utilize docker runtime, please use the second entry
#Installs Containerd runtime
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server — tls-san $(curl ifconfig.me) — write-kubeconfig-mode 644 — cluster-cidr=188.8.131.52/16" sh -#Utilze Docker runtime (Install Docker first)
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server — tls-san $(curl ifconfig.me) — write-kubeconfig-mode 644 — cluster-cidr=184.108.40.206/16" sh -s - --docker
2. On the worker node side — Jetson nano. There are a few more requirements to be fulfilled if you wanted to leverage the GPU capabilities. According to Kubernetes official documentation, here are the min requirements.
- Kubernetes nodes have to be pre-installed with NVIDIA drivers.
- Kubernetes nodes have to be pre-installed with nvidia-docker 2.0
- Kubelet must use Docker as its container runtime
nvidia-container-runtimemust be configured as the default runtime for Docker, instead of runc.
- The version of the NVIDIA drivers must match the constraint ~= 384.81.
3. Once the above requirements are completed, we can join the worker node to control plane
curl -sfL https://get.k3s.io | K3S_URL=https://[ControlPlaneIP]:6443 K3S_TOKEN=[TOKEN_ID] sh -s —-docker#ControlPlaneIP — The IP of the VM
#TOKEN_ID — to get this ID, run the command in the control plane
To deploy the NVIDIA device plugin once your cluster is running and the above requirements are satisfied:
# Enable GPU support on the Worker node by deploying the following Daemonset:kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.10.0/nvidia-device-plugin.yml
4. Now we can test whether the pod can utilize the GPU. Please execute the body manifest
- name: nvidia
command: [ "./deviceQuery" ]
If everything looks fine, you would get the below output
# kubectl logs devicequery./deviceQuery Starting…CUDA Device Query (Runtime API) version (CUDART static linking)Detected 1 CUDA Capable device(s)Device 0: “NVIDIA Tegra X1”
CUDA Driver Version / Runtime Version 10.2 / 10.2
CUDA Capability Major/Minor version number: 5.3
Total amount of global memory: 3964 MBytes (4156682240 bytes)
( 1) Multiprocessors, (128) CUDA Cores/MP: 128 CUDA Cores
GPU Max Clock rate: 922 MHz (0.92 GHz)
Memory Clock rate: 13 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
Please try it out and share your feedback.