MLOps: Deploying Aliyun GPU sharing in Kubernetes

Published in

AI2 Labs

3 min readAug 27, 2020

This article is part of the MLOps series.

Since GPU machines are extremely expensive, we would want to cram as much applications as possible. However, with the current Nvidia device plugin, which only allows you to specify how many GPUs you want to use, obviously that is not ideal, since most AI applications use about less than half of the GPU memory available. This is an obvious bottleneck in deploying AI applications for most companies. We probably can expect more innovation in this area of technology for efficient use of GPU resources, but we do have the next best thing right now.

Alibaba open sourced their GPU scheduler and device plugin that also exposes the amount of GPU memory in your cluster. This way, you can also say how much GPU memory you want to use, allowing us to run multiple AI programs on the same GPU. While there has been push for algorithms that can run on the CPU, with the advent of 5G I can see a need to push for algorithms that can process HD videos, which would require GPUs to be effective.

Installation

Nvidia

First of all, the docker engines must support using the nvidia docker engine.

Install nvidia-docker2 (1 has been discontinued)

curl -s -L <https://nvidia.github.io/nvidia-docker/gpgkey> | \\
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L <https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list> | \\
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update && apt install nvidia-docker2 -y

Change /etc/docker/daemon.json to use the following, before restarting docker.

{
	"default-runtime": "nvidia",
	"runtimes": {
		"nvidia": {
				"path": "/usr/bin/nvidia-container-runtime",
				"runtimeArgs": []
			}
		}
}

The plugin

Installing the plugin is pretty tricky. First of all you need to add a file to the kubernetes installation (this article assumes kubeadm is used.)

cd /etc/kubernetes/
sudo curl -O <https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/scheduler-policy-config.json>
kubectl create -f <https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/gpushare-schd-extender.yaml>

Then you’ll need to modify the default scheduler configuration in /etc/kubernetes/manifests/kube-scheduler.yaml

Add the new policy configuration that you’ve downloaded to the original scheduler arguments.

- --policy-config-file=/etc/kubernetes/scheduler-policy-config.json

Add volume mounts to the pod specification to access the configuration. This should point to wherever you’ve downloaded the json file.

- mountPath: /etc/kubernetes/scheduler-policy-config.json
  name: scheduler-policy-config
  readOnly: true- hostPath:
      path: /etc/kubernetes/scheduler-policy-config.json
      type: FileOrCreate
  name: scheduler-policy-config

Since kubernetes has likely deployed the scheduler as a static pod, we need to copy and paste the file to trigger the change like so.

sudo cp /etc/kubernetes/manifests/kube-scheduler.yaml /tmp 
sudo cp /tmp/kube-scheduler.yaml /etc/kubernetes/manifests/kube-scheduler.yaml

Then now finally, we can deploy the actual gpu device plugin. Make sure not to use the original NVIDIA plugin.

kubectl create -f <https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-rbac.yaml>
kubectl create -f <https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-ds.yaml>

We all need to mark any node we want to use this plugin with a new label.

kubectl label node <target_node> gpushare=true

Finally, you should also install the inspection extension for kubectl wherever you install kubectl.

cd /usr/bin/
sudo wget <https://github.com/AliyunContainerService/gpushare-device-plugin/releases/download/v0.3.0/kubectl-inspect-gpushare>
sudo chmod u+x /usr/bin/kubectl-inspect-gpushare

And voila you can now share gpu memory among your kubernetes applications!

Do note however, that tensorflow applications take up all GPU memory by default, you will need to set the memory to grow as the application is used as shown here.

MLOps: Deploying Aliyun GPU sharing in Kubernetes

Installation

Nvidia

The plugin

Written by Daniel Tan