Using GPUs with Apache OpenWhisk
This blog is written in collaboration with Avi Weit.
GPUs have become extremely popular for various applications over the last few years with media-intensive and AI applications leading the pack. Chances are that you are familiar with the serverless computing concepts and might be even using it in your projects. However, very few serverless frameworks support GPUs out of the box. In particular, Apache OpenWhisk official distro lacks this capability at the moment. In this blog, we show how Apache OpenWhisk can be enhanced with GPU support with only minimal changes to the existing code base. If you try and like it, drop us a line and we might go forward and submit a pull request with the Apache OpenWhisk community.
But first things first. Let’s start with… Kubernetes. Why Kubernetes might you ask? Well, to start with, Apache OpenWhisk and Kubernetes are friends. One can configure OpenWhisk either with its native Docker Container Factory or Kubernetes Container Factory. To skip all the gory details, in the former option, serverless functions (called actions in the OpenWhisk parlance) will execute as Docker containers orchestrated by the OpenWhisk’s own orchestrator making use of the Docker API; and in the latter case, the actions will execute as Docker containers within pods orchestrated by Kubernetes.
Being orchestrated by Kubernetes brings about some benefits. For one, Kubernetes can easily enforce sophisticated placement policies when scheduling actions that require GPU support. In other words, by giving the Kubernetes scheduler some hints about where we want a given action to end up in a cluster, we are going to save ourselves a lot of work on finding GPUs, allocating GPUs, vacating them, etc.
So, two thumbs for Kubernetes! But how this magic would happen exactly?
Since K8s 1.6, Kubernetes includes experimental support for managing NVIDIA GPUs spread over nodes. The recommended way to consume GPUs is by using device plugins (from K8s 1.8 onward).
In our design, we will use a very convenient mechanism that Kubernetes provides: resource limits.
In essence, when a new OpenWhisk action is being offloaded to Kubernetes, a component called Invoker specifies GPU request in the resources limits section.
Each action in Apache OpenWhisk is of some kind. The dictionary of kinds (i.e., run times) is a JSON structure that is part of the Apache OpenWhisk configuration. To enable the new kind(s) of actions — those that will consume GPU — we enrich the run times dictionary as follows.
"deepspeech":[
{
"kind": "python:3ds@gpu",
"default": true,
"image": {
"prefix": "docker5gmedia",
"name": "python3dscudaaction",
"tag": "latest"
},
"deprecated": false,
"attached": {
"attachmentName": "codefile",
"attachmentType": "text/plain"
}
}
],
In this case, we define a new action kind: python:3ds@gpu
that serves a DeepSpeech model on GPU. We will explain what this action actually does a bit later. But before going there, let’s take a look at another custom defined action that consumes GPU. Let’s assume that we want to define a “generic” CUDA action.
"cuda":[
{
"kind": "cuda:8@gpu",
"default": true,
"image": {
"prefix": "docker5gmedia",
"name": "cuda8action",
"tag": "latest"
},
"deprecated": false,
"attached": {
"attachmentName": "codefile",
"attachmentType": "text/plain"
} }
Of course, just defining a new entry in the dictionary is not enough. We need to teach Invoker to understand these new definitions. For the GPU consuming actions, we will adopt a convention that a string of the form:
<run-time name>@gpu
means an action run time name that requests a GPU. For example,
cuda:8@gpu
or
python:3ds@gpu
A yaml pod definition for the action prepared by Invoker will be looking something like that:
apiVersion: v1
kind: Pod
metadata:
name: cuda8action
spec:
containers:
- name: cuda8action
image: "docker5gmedia/cuda8action:latest"
resources:
limits:
nvidia.com/gpu: 1
To achieve this, we enriched KubernetesClient with the logic that understands the new action kinds and translates them into the pod definitions. And this is the only place in the code base that was changed. Simple.
Don’t go away just yet. Enough talk, let’s get our hands dirty. In the reminder of this blog, we will guide you through the two examples of using our newly created GPU actions.
Oh, just before you can try out our GPU OpenWisk actions, you will have to set up a development environment. If you already have a K8s cluster with Nvidia Docker installed on some nodes, you might wish to skip this section and go directly to the next part. For the rest of us, let’s do some preliminary work first.
Prerequisites
We will need a fresh Ubuntu 16.04 Linux machine with 16 GB RAM, at least 100 GB disk, and a GPU card. To avoid a problem that a Virtual Machine (VM) cannot be exposed to GPU properly (for the GPU hardware configuration that we have), we use a physical machine (Lenovo W530 with the Quadro K100M GPU card). I wish you have a more powerful GPU on your laptop :) And maybe even more than one. However, even this modest card will be enough to demonstrate the concept.
We will guide you step by step, starting from a clean slate Linux installation. However, if you already have in place, a machine with Ubuntu 16.04, Nvidia driver, Docker-CE 18.06.0, and Minikube with GPU support enabled, you might wish to skip Steps 1–4 below and go directly to Step 5.
Step 1: Install Nvidia CUDA Drivers
First, make sure to grab the latest updates
sudo apt-get update
sudo apt-get upgrade
Next, follow instructions from Cuda-toolkit 9.2 Installer:
sudo dpkg -i cuda-repo-ubuntu1604_9.2.148-1_amd64.deb
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda
Reboot and verify your installation:
nvidia-smi
On our laptop it looks like this:
Step 2: Install Docker
We need a Docker version (Docker-CE 18.06.0) that is verified to work with Nvidia, Kubernetes, and Minikube (we will use Minikube for our tests).
sudo apt-get updatesudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
Add the key and ensure its fingerprint
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
Register docker repository
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
sudo apt-get update
Install docker 18.06.0
sudo apt-get install docker-ce=18.06.0~ce~3-0~ubuntu containerd.io
Verify docker
sudo docker run hello-world
Step 3: Install Nvidia-docker runtime
Your next step is to install Nvidia docker runtime. It should be matched with the docker version that you’ve installed in the previous step.
Add the package repositories:
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2=2.0.3+docker18.06.0-1 nvidia-container-runtime=2.0.0+docker18.06.0-1
Restart the Docker daemon:
sudo pkill -SIGHUP dockerd
Test nvidia-smi with the latest official CUDA image:
sudo docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
Now you need to enable the Nvidia runtime as your default runtime. Edit the docker daemon config file, which usually can be found in
/etc/docker/daemon.json
Make it look like this:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Restart Docker
sudo service docker stop
sudo service docker start
Step 4: Install Minikube
There are plenty of tutorials on how to install Minikube on Ubuntu. We recap the essential steps for completeness and convenience of reference.
Start by installing kubectl:
curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl
Verify the installation:
sudo kubectl version
Now install Minikube:
curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 \
&& chmod +x minikube
sudo install minikube /usr/local/bin
You may need to run kubectl/minikube/helm commands under sudo if you have not configured them to run under your own user
Install dependencies:
sudo apt-get install socat
Start Minikube:
sudo minikube start --vm-driver=none --memory=8192
The command may take a few minutes to complete…
After Minikube starts:
sudo ip link set docker0 promisc on
(you should run this command each time you restart Minikube)
Now, let’s verify the Minikube installation:
sudo minikube status
get pod --all-namespaces
Check status and wait for all pods to become Running
or Completed
Now, enable GPU support on Kubernetes
sudo kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta/nvidia-device-plugin.yml
Let’s make sure that pods can indeed consume GPU. Create a sample pod definition:
cat <<EOF > pod-gpu.yaml
apiVersion: v1
kind: Pod
metadata:
name: cuda-vector-add
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
image: "k8s.gcr.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 GPU
EOF
and create a pod using this yaml:
sudo kubectl create -f pod-gpu.yaml
Wait for it to enter Completed
state and verify its logs
sudo kubectl logs cuda-vector-add
Step 5: Install OpenWhisk Fork with GPU Support on Minikube using Helm
Create yaml definition under ~/mycluster.yaml
whisk:
ingress:
type: NodePort
apiHostName: <MINIKUBE IP>
apiHostPort: 31001
limits:
actions:
memory:
max: "2048m"
nginx:
httpsNodePort: 31001
Replace ingress.apiHostName with the output of the minikube ip
command.
Download helm and extract it to the home directory.
Init helm causing the tiller pod to get created:
sudo ~/linux-amd64/helm init
When the tiller pod moves to the Running
state, grant the necessary privileges to helm:
sudo kubectl create clusterrolebinding tiller-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default
Label our node (we have only one)
kubectl label nodes --all openwhisk-role=invoker
Clone the OpenWhisk repository and run helm to install OpenWhisk
cd ~
git clone https://github.com/5g-media/incubator-openwhisk-deploy-kube.git
cd incubator-openwhisk-deploy-kube
git checkout gpu
~/linux-amd64/helm install ./helm/openwhisk --namespace=openwhisk --name=owdev -f ~/mycluster.yaml
Coffee time: installation can take a few minutes…
Wait for the invoker-health pod to get created and run:
sudo kubectl get pods -n openwhisk | grep invokerhealthtestaction
We are almost done. Now we need to pull the GPU runtime images for the OpenWhisk GPU actions:
sudo docker pull docker5gmedia/python3dscudaaction
sudo docker pull docker5gmedia/cuda8action
These images are large, so it might take a few more minutes.
The last step in the OpenWhisk installation is to install and configure the OpenWhisk CLI.
To install:
curl -L https://github.com/apache/incubator-openwhisk-cli/releases/download/latest/OpenWhisk_CLI-latest-linux-amd64.tgz -o /tmp/wsk.tgz
tar xvfz /tmp/wsk.tgz -C /tmp/
mv /tmp/wsk /usr/local/bin
To configure:
wsk property set --apihost <whisk.ingress.apiHostName>:<whisk.ingress.apiHostPort>
wsk property set --auth 23bc46b1-71f6-4ed5-8c54-816aa4f8c502:123zO3xZCLrMN6v2BKK1dXYFpXlPkccOFqm12CdAsMgRU4VrNZ9lyGVCGuMDGIwPcat <<EOF > ~/.wskprops
APIHOST=$OPENWHISK_APIHOST
AUTH=$OPENWHISK_AUTH
EOF
If you run into problems, for troubleshooting take a look at this more complete description.
We have everything set up. Now let the real fun begin! We will start with following the footsteps of an introductory blog by Nvidia. To save us the hassle of setting up the development environment for CUDA, we will extend a Docker development container image prepared by Nvidia. The new container already has all the dependencies packed (including OpenWhisk CLI being pre-installed and pre-configured). We will start this container in the interactive mode as follows:
sudo docker run -it -e OPENWHISK_APIHOST=`sudo minikube ip`:31001 --rm docker5gmedia/5gmedia-playbox-minikube-ow-gpu:1.0 /bin/bash
In our development container image for Minikube, we have Apache OpenWhisk CLI pre-installed and pre-configured. To check:
more ~/.wskprops
You should see something like this:
Now, let’s create the code of our CUDA action. We will use the same code as the Nvidia development blog uses to illustrate how GPU can be consumed.
cat <<EOF > /add.cu
#include <iostream>
#include <math.h>
// Kernel function to add the elements of two arrays
__global__
void add(int n, float *x, float *y)
{
for (int i = 0; i < n; i++)
y[i] = x[i] + y[i];
}
int main(void)
{
int N = 1<<20;
float *x, *y;
// Allocate Unified Memory . accessible from CPU or GPU
cudaMallocManaged(&x, N*sizeof(float));
cudaMallocManaged(&y, N*sizeof(float));
// initialize x and y arrays on the host
for (int i = 0; i < N; i++) {
x[i] = 1.0f;
y[i] = 2.0f;
}
// Run kernel on 1M elements on the GPU
add<<<1, 1>>>(N, x, y);
// Wait for GPU to finish before accessing on host
cudaDeviceSynchronize();
// Check for errors (all values should be 3.0f)
float maxError = 0.0f;
for (int i = 0; i < N; i++)
maxError = fmax(maxError, fabs(y[i]-3.0f));
std::cout << "{\"message\": \"Max error: " << maxError << "\"}";
// Free memory
cudaFree(x);
cudaFree(y);
return 0;
}
EOF
Compile the example:
nvcc add.cu -o add_cuda
And run it:
./add_cuda
If you see the above, the example works as expected.
Create the OpenWhisk action:
wsk -i action create cuda_Test myAction.zip --kind cuda:8@gpu
And invoke it:
wsk -i action invoke -b cuda_Test
Since “-b” flag is used in the invocation, the invocation is blocking and should return the following result (see the fragment below):
"response": {
"result": {
"message": "Max error: 0"
},
"status": "success",
"success": true
},
The whole output looks something like this:
To observe the action running, we ca use sudo watch nvidia-smi
Ok. It works. But maybe running a basic CUDA example is not what you want to do? Let’s consider another use case: speech recognition. In this scenario we will use DeepSpeech.
The development container that we’ve been using in the previous scenario already has this git repository installed.
cd /incubator-openwhisk-runtime-python/core/python3DSAction/sample
We now need to create the DeepSpeech action as follows
wsk -i action create myAction-gpu ds_action.py -m 2048 --kind python:3ds@gpu
Let’s check it is created
wsk -i action list
We will use this sample sound file as the input (play it first using your favorite player, if you wish).
Now let’s see whether our DeepSpeech action will recognize what this sample is about.
wsk -i action invoke -r myAction-gpu -p url https://raw.githubusercontent.com/5g-media/incubator-openwhisk-runtime-python/gpu/core/python3DSAction/sample/audio/2830-3980-0043.wav
It is time to summarize what we’ve leaned in this blog.
- Apache OpenWhisk has a flexible pluggable architecture that allows easily to add new types of actions. In our case — GPU consuming actions;
- Combined with the K8s GPU scheduling support, OpenWhisk actions can readily exploit GPU — a requirement that comes up repeatedly in a variety of use case contexts;
- Let us know what you think.
This work is part of the 5G-MEDIA project
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 761699