Published in


Running Kubernetes on GPU Nodes

Kubernetes on GPU Nodes

Jetson Nano is a small, powerful computer designed to power entry-level edge AI applications and devices. It has a GPU core, which can be utilized for resource-intensive processes such as running ML models, video streaming, etc. Getting a GPU core-enabled machine in the cloud is a bit more expensive than traditional systems, so I thought I can tap on my Jetson nano and link it with the Cloud machines.

Recently I created two nodes Kubernetes cluster using K3s and Nvidia Jetson Nano. So in this short post, I will describe the steps used to create K3s Cluster with Control Plane on the Cloud and Nvidia Jetson as worker node which sits in my home. So let’s get started.


  1. Install K3s on the VM which acts as the control plane. By default, it uses Containerd as the container runtime. If you wanted to utilize docker runtime, please use the second entry
#Installs Containerd runtime
curl -sfL | INSTALL_K3S_EXEC="server — tls-san $(curl — write-kubeconfig-mode 644 — cluster-cidr=" sh -
#Utilze Docker runtime (Install Docker first)
curl -sfL | INSTALL_K3S_EXEC="server — tls-san $(curl — write-kubeconfig-mode 644 — cluster-cidr=" sh -s - --docker

2. On the worker node side — Jetson nano. There are a few more requirements to be fulfilled if you wanted to leverage the GPU capabilities. According to Kubernetes official documentation, here are the min requirements.

  • Kubernetes nodes have to be pre-installed with NVIDIA drivers.
  • Kubernetes nodes have to be pre-installed with nvidia-docker 2.0
  • Kubelet must use Docker as its container runtime
  • nvidia-container-runtime must be configured as the default runtime for Docker, instead of runc.
  • The version of the NVIDIA drivers must match the constraint ~= 384.81.

3. Once the above requirements are completed, we can join the worker node to control plane

curl -sfL | K3S_URL=https://[ControlPlaneIP]:6443 K3S_TOKEN=[TOKEN_ID] sh -s —-docker#ControlPlaneIP — The IP of the VM
#TOKEN_ID — to get this ID, run the command in the control plane
cat /var/lib/rancher/k3s/server/token

To deploy the NVIDIA device plugin once your cluster is running and the above requirements are satisfied:

# Enable GPU support on the Worker node by deploying the following Daemonset:kubectl create -f

4. Now we can test whether the pod can utilize the GPU. Please execute the body manifest

apiVersion: v1
kind: Pod
name: devicequery
nodeName: edgeblazer-desktop
- name: nvidia
image: xift/jetson_devicequery:r32.5.0
command: [ "./deviceQuery" ]

If everything looks fine, you would get the below output

# kubectl logs devicequery./deviceQuery Starting…CUDA Device Query (Runtime API) version (CUDART static linking)Detected 1 CUDA Capable device(s)Device 0: “NVIDIA Tegra X1”
CUDA Driver Version / Runtime Version 10.2 / 10.2
CUDA Capability Major/Minor version number: 5.3
Total amount of global memory: 3964 MBytes (4156682240 bytes)
( 1) Multiprocessors, (128) CUDA Cores/MP: 128 CUDA Cores
GPU Max Clock rate: 922 MHz (0.92 GHz)
Memory Clock rate: 13 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS

Please try it out and share your feedback.






Articles and how-to’s on #ansible, #openshift, #kubernetes #terraform and other #devops technologies.

Recommended from Medium

Measuring and Evaluating Service Level Objectives (SLOs)

Deploy your First Smart Contract in 5 minutes

Hosting a Laravel Application on Azure Web App

Interview with 2018 EMPEX LA Speaker Todd Resudek

A New Old-Coder Birthday Reflection

Keys on Steroids: Simple Setting Tweak to Increase Your Efficiency

KWoC 2019 (End Evaluation Report)

Tip of the week — Brighten up your collection labels with emojis 🤯

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Renjith Ravindranathan

Renjith Ravindranathan

A DevOps engineer by profession, Dad, Traveler & sometimes, like to tweak around stuff inside memory constrained devices. Currently living in the Netherlands.

More from Medium

Setup Traefik routing in Kubernetes with Helm chart

Externalizing Configurations in Kubernetes Using ConfigMap and Secret

3 Steps Creating Self-managed Kubernetes High Availability in Azure for Open5gs [part 3]

Kubernetes-in-Kubernetes with kubeadm and Sysbox