Deploy GPU-enabled Kubernetes Pod on NVIDIA Jetson Nano

  • Recompile Jetson Nano’s kernel to enable modules needed by Kubernetes (K8s) and in my cluster, weaveworks/weave-kube, too
  • Expose GPU devices to the containers running the GPU workload pods

Recompile NVIDIA Jetson Nano’s Kernel for K8s

CONFIG_CGROUP_HUGETLB=y
CONFIG_CFQ_GROUP_IOSCHED=y
CONFIG_INET_ESP=m
CONFIG_NF_NAT_REDIRECT=m
CONFIG_NETFILTER_XT_SET=m
CONFIG_NETFILTER_XT_TARGET_REDIRECT=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
CONFIG_NETFILTER_XT_MATCH_RECENT=m
CONFIG_IP_SET=m
CONFIG_IP_SET_MAX=256
CONFIG_IP_SET_BITMAP_IP=m
CONFIG_IP_SET_BITMAP_IPMAC=m
CONFIG_IP_SET_BITMAP_PORT=m
CONFIG_IP_SET_HASH_IP=m
CONFIG_IP_SET_HASH_IPMARK=m
CONFIG_IP_SET_HASH_IPPORT=m
CONFIG_IP_SET_HASH_IPPORTIP=m
CONFIG_IP_SET_HASH_IPPORTNET=m
CONFIG_IP_SET_HASH_MAC=m
CONFIG_IP_SET_HASH_NETPORTNET=m
CONFIG_IP_SET_HASH_NET=m
CONFIG_IP_SET_HASH_NETNET=m
CONFIG_IP_SET_HASH_NETPORT=m
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_IP_VS_PROTO_TCP=y
CONFIG_IP_VS_PROTO_UDP=y
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_NET_L3_MASTER_DEV=y
CONFIG_DM_BUFIO=m
CONFIG_DM_BIO_PRISON=m
CONFIG_DM_PERSISTENT_DATA=m
CONFIG_DM_THIN_PROVISIONING=m
CONFIG_IPVLAN=m
sudo modprobe ip_set
sudo modprobe xt_set
sudo modprobe xt_physdev

Create K8s pod with GPU support

/dev/nvhost-ctrl
/dev/nvhost-ctrl-gpu
/dev/nvhost-prof-gpu
/dev/nvmap
/dev/nvhost-gpu
/dev/nvhost-as-gpu
apiVersion: apps/v1
kind: Deployment
metadata:
name: gputest-deployment
spec:
replicas: 1
selector:
matchLabels:
app: gputest
template:
metadata:
name: gputest
labels:
app: gputest
spec:
hostname: gputest
containers:
- name: gputest
image: <private_registry_url>/device_query:latest
volumeMounts:
- mountPath: /dev/nvhost-ctrl
name: nvhost-ctrl
- mountPath: /dev/nvhost-ctrl-gpu
name: nvhost-ctrl-gpu
- mountPath: /dev/nvhost-prof-gpu
name: nvhost-prof-gpu
- mountPath: /dev/nvmap
name: nvmap
- mountPath: /dev/nvhost-gpu
name: nvhost-gpu
- mountPath: /dev/nvhost-as-gpu
name: nvhost-as-gpu
- mountPath: /usr/lib/aarch64-linux-gnu/tegra
name: lib
securityContext:
privileged: true
volumes:
- name: nvhost-ctrl
hostPath:
path: /dev/nvhost-ctrl
- name: nvhost-ctrl-gpu
hostPath:
path: /dev/nvhost-ctrl-gpu
- name: nvhost-prof-gpu
hostPath:
path: /dev/nvhost-prof-gpu
- name: nvmap
hostPath:
path: /dev/nvmap
nodeSelector:
devicemodel: nvidiajetsonnano
/cudaSamples/deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking)Detected 1 CUDA Capable device(s)Device 0: "NVIDIA Tegra X1"
CUDA Driver Version / Runtime Version 10.0 / 10.0
CUDA Capability Major/Minor version number: 5.3
Total amount of global memory: 3964 MBytes (4156870656 bytes)
( 1) Multiprocessors, (128) CUDA Cores/MP: 128 CUDA Cores
GPU Max Clock rate: 922 MHz (0.92 GHz)
Memory Clock rate: 13 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS

Conclusion

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store