Serving AI models on the edge — Using Nvidia GPU with k3s on AWS— Part 4

Published in

Sparque labs

6 min readAug 8, 2023

Introduction

This is part 4 in the series of “Serving AI models on the edge”.

Here we will focus on using k3s to serve AI models as its a lightweight Kubernetes distro that is regularly used on edge platforms.

Overview

The Nvidia Container Toolkit along with the Nvidia Operator enables the Kubernetes/K3s cluster to access GPU and CUDA operations. How this is done is shown at a high level below:

Containerd enabledment — nvidia container runtime enables access to Nvidia Driver using a runc shim

Containerd enablement — runc shim enables access to nvidia driver for the container

The Nvidia GPU operator enables the Kubernetes cluster nodes to be able to use GPUs as shown below.

Steps

We use AWS EC2 as an example, but the following steps can be used with Nvidia-enabled Ubuntu VM node on any infrastructure.

Picking an Nvidia enabled GPU AMI

First, we pick an AMI that is already enabled with the Nvidia CUDA Toolkit, GPU Driver, Nvidia Container toolkit. This can be done by using the AMI catalog and searching for the Deep Learning AMI below.

NVidia Deep Learning AMI with CUDA, Driver, Container Toolkit installed

Checking Nvidia and CUDA versions

Next, we verify that the Nvidia drivers are working properly by running nvidia-smi utility. Note that this gives the output below. If it could not communicate with the GPU it would have given an error message.

nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |
|  0%   32C    P0    60W / 300W |      0MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Checking Nvidia and docker

We can check if the docker runtime can leverage the GPU by doing:

sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |
|  0%   28C    P8    12W / 300W |      0MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Installing k3s

Though we captured how to install and configure k3s for AWS ECR in earlier articles in this series, following is a quick command to install k3s:

curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644

Configuring Nvidia GPU Operator

We will use the Nvidia GPU operator using the instructions from the operator page.

Kubernetes provides access to special hardware resources such as NVIDIA GPUs, NICs, Infiniband adapters and other devices through the device plugin framework. However, configuring and managing nodes with these hardware resources requires configuration of multiple software components such as drivers, container runtimes or other libraries which are difficult and prone to errors. The NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Toolkit, automatic node labelling using GFD, DCGM based monitoring and others.

To install the operator:

# first, install the helm utility
sudo snap install helm

# we first install the helm repo for nvidia and update
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
   && helm repo update

# install the operator
helm install --wait nvidiagpu \
     -n gpu-operator --create-namespace \
    --set toolkit.env[0].name=CONTAINERD_CONFIG \
    --set toolkit.env[0].value=/var/lib/rancher/k3s/agent/etc/containerd/config.toml \
    --set toolkit.env[1].name=CONTAINERD_SOCKET \
    --set toolkit.env[1].value=/run/k3s/containerd/containerd.sock \
    --set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \
    --set toolkit.env[2].value=nvidia \
    --set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT \
    --set-string toolkit.env[3].value=true \
     nvidia/gpu-operator

NAME: nvidiagpu
LAST DEPLOYED: Tue Aug  8 00:54:41 2023
NAMESPACE: gpu-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None

# TIP: to uninstall
# helm uninstall -n gpu-operator nvidiagpu

Check nvidia-smi usage in a kubernetes pod

We check if nvidia-smi can be run inside a kubernetes pod. We define a pod that calls nvidia-smi below:

# gpu-pod.yaml
apiVersion: v1
kind: Pod
metadata:
 name: gpu-pod
spec:
 restartPolicy: OnFailure
 runtimeClassName: nvidia
 containers:
   - name: cuda-container
     image: nvidia/cuda:11.6.2-base-ubuntu20.04
     command: ["nvidia-smi"]
     resources:
       limits:
         nvidia.com/gpu: 1 # requesting 1 GPU

Deploy this pod:

kubectl apply -f gpu-pod.yaml

After deploying this and it becomes ready, check its logs:

kubectl logs gpu-pod

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |
|  0%   30C    P8    24W / 300W |      0MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

This shows the GPU being access from inside a pod running in Kubernetes

Check if pods can run some CUDA operations

To test if the pods can access the GPU, we need to do the following:

First, we define a pod yaml:

# vec-add-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: vec-add-pod
spec:
  restartPolicy: OnFailure
  runtimeClassName: nvidia
  containers:
    - name: cuda-vector-add
      # https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile
      image: "k8s.gcr.io/cuda-vector-add:v0.1"
      resources:
        limits:
          nvidia.com/gpu: 1

We deploy the pod:

kubectl apply vec-add-pod.yaml

Note that it takes some time to get activated and running. We then check the logs of the pod after it is running.

kubectl logs vec-add-pod

[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

This shows that the pod was able to run CUDA operations using the pod and used the GPU successfully.

Summary

We showed the deployment of the Nvidia Operator that GPU enables Kubernetes and k3s, and then showed that the pod was able to check access with nvidia-smi.

Then we deployed a pod that ran some CUDA operations to prove that CUDA operations can be run inside Kubernetes and k3s.

Next, we will show how to enable the gpt2 model that we deployed earlier leveraging the GPU rather than the CPU.