Using GPUs with Apache OpenWhisk

This blog is written in collaboration with Avi Weit.

Published in

Apache OpenWhisk

11 min readAug 15, 2019

GPUs have become extremely popular for various applications over the last few years with media-intensive and AI applications leading the pack. Chances are that you are familiar with the serverless computing concepts and might be even using it in your projects. However, very few serverless frameworks support GPUs out of the box. In particular, Apache OpenWhisk official distro lacks this capability at the moment. In this blog, we show how Apache OpenWhisk can be enhanced with GPU support with only minimal changes to the existing code base. If you try and like it, drop us a line and we might go forward and submit a pull request with the Apache OpenWhisk community.

But first things first. Let’s start with… Kubernetes. Why Kubernetes might you ask? Well, to start with, Apache OpenWhisk and Kubernetes are friends. One can configure OpenWhisk either with its native Docker Container Factory or Kubernetes Container Factory. To skip all the gory details, in the former option, serverless functions (called actions in the OpenWhisk parlance) will execute as Docker containers orchestrated by the OpenWhisk’s own orchestrator making use of the Docker API; and in the latter case, the actions will execute as Docker containers within pods orchestrated by Kubernetes.

Being orchestrated by Kubernetes brings about some benefits. For one, Kubernetes can easily enforce sophisticated placement policies when scheduling actions that require GPU support. In other words, by giving the Kubernetes scheduler some hints about where we want a given action to end up in a cluster, we are going to save ourselves a lot of work on finding GPUs, allocating GPUs, vacating them, etc.

So, two thumbs for Kubernetes! But how this magic would happen exactly?

Since K8s 1.6, Kubernetes includes experimental support for managing NVIDIA GPUs spread over nodes. The recommended way to consume GPUs is by using device plugins (from K8s 1.8 onward).

In our design, we will use a very convenient mechanism that Kubernetes provides: resource limits.

In essence, when a new OpenWhisk action is being offloaded to Kubernetes, a component called Invoker specifies GPU request in the resources limits section.

Each action in Apache OpenWhisk is of some kind. The dictionary of kinds (i.e., run times) is a JSON structure that is part of the Apache OpenWhisk configuration. To enable the new kind(s) of actions — those that will consume GPU — we enrich the run times dictionary as follows.

"deepspeech":[
   {
        "kind": "python:3ds@gpu",
        "default": true,
        "image": {
            "prefix": "docker5gmedia",
            "name": "python3dscudaaction",
            "tag": "latest"
        },
        "deprecated": false,
        "attached": {
            "attachmentName": "codefile",
            "attachmentType": "text/plain"
        }
    }
],

In this case, we define a new action kind: python:3ds@gpu that serves a DeepSpeech model on GPU. We will explain what this action actually does a bit later. But before going there, let’s take a look at another custom defined action that consumes GPU. Let’s assume that we want to define a “generic” CUDA action.

"cuda":[
 {
        "kind": "cuda:8@gpu",
        "default": true,
        "image": {
             "prefix": "docker5gmedia",
             "name": "cuda8action",
             "tag": "latest"
        },
        "deprecated": false,
        "attached": {
        "attachmentName": "codefile",
             "attachmentType": "text/plain"
        } }

Of course, just defining a new entry in the dictionary is not enough. We need to teach Invoker to understand these new definitions. For the GPU consuming actions, we will adopt a convention that a string of the form:

<run-time name>@gpu

means an action run time name that requests a GPU. For example,

cuda:8@gpu

python:3ds@gpu

A yaml pod definition for the action prepared by Invoker will be looking something like that:

apiVersion: v1
kind: Pod
metadata:
  name: cuda8action
spec:
  containers:
    - name: cuda8action
      image: "docker5gmedia/cuda8action:latest"
      resources:
        limits:
          nvidia.com/gpu: 1

To achieve this, we enriched KubernetesClient with the logic that understands the new action kinds and translates them into the pod definitions. And this is the only place in the code base that was changed. Simple.

Don’t go away just yet. Enough talk, let’s get our hands dirty. In the reminder of this blog, we will guide you through the two examples of using our newly created GPU actions.

Oh, just before you can try out our GPU OpenWisk actions, you will have to set up a development environment. If you already have a K8s cluster with Nvidia Docker installed on some nodes, you might wish to skip this section and go directly to the next part. For the rest of us, let’s do some preliminary work first.

Prerequisites

We will need a fresh Ubuntu 16.04 Linux machine with 16 GB RAM, at least 100 GB disk, and a GPU card. To avoid a problem that a Virtual Machine (VM) cannot be exposed to GPU properly (for the GPU hardware configuration that we have), we use a physical machine (Lenovo W530 with the Quadro K100M GPU card). I wish you have a more powerful GPU on your laptop :) And maybe even more than one. However, even this modest card will be enough to demonstrate the concept.

We will guide you step by step, starting from a clean slate Linux installation. However, if you already have in place, a machine with Ubuntu 16.04, Nvidia driver, Docker-CE 18.06.0, and Minikube with GPU support enabled, you might wish to skip Steps 1–4 below and go directly to Step 5.

Step 1: Install Nvidia CUDA Drivers

First, make sure to grab the latest updates

sudo apt-get update
sudo apt-get upgrade

Next, follow instructions from Cuda-toolkit 9.2 Installer:

sudo dpkg -i cuda-repo-ubuntu1604_9.2.148-1_amd64.deb
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda

Reboot and verify your installation:

nvidia-smi

On our laptop it looks like this:

Step 2: Install Docker

We need a Docker version (Docker-CE 18.06.0) that is verified to work with Nvidia, Kubernetes, and Minikube (we will use Minikube for our tests).

sudo apt-get updatesudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common

Add the key and ensure its fingerprint

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88

sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
sudo apt-get update

Install docker 18.06.0

sudo apt-get install docker-ce=18.06.0~ce~3-0~ubuntu containerd.io

Verify docker

sudo docker run hello-world

Step 3: Install Nvidia-docker runtime

Your next step is to install Nvidia docker runtime. It should be matched with the docker version that you’ve installed in the previous step.

Add the package repositories:

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

Install the matched version:

sudo apt-get install -y nvidia-docker2=2.0.3+docker18.06.0-1 nvidia-container-runtime=2.0.0+docker18.06.0-1

Restart the Docker daemon:

sudo pkill -SIGHUP dockerd

Test nvidia-smi with the latest official CUDA image:

sudo docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi

Now you need to enable the Nvidia runtime as your default runtime. Edit the docker daemon config file, which usually can be found in

/etc/docker/daemon.json

Make it look like this:

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Restart Docker

sudo service docker stop
sudo service docker start

Step 4: Install Minikube

There are plenty of tutorials on how to install Minikube on Ubuntu. We recap the essential steps for completeness and convenience of reference.

Start by installing kubectl:

curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl

Verify the installation:

sudo kubectl version

Now install Minikube:

curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 \
  && chmod +x minikube
sudo install minikube /usr/local/bin

You may need to run kubectl/minikube/helm commands under sudo if you have not configured them to run under your own user

Install dependencies:

sudo apt-get install socat

Start Minikube:

sudo minikube start --vm-driver=none --memory=8192

The command may take a few minutes to complete…

After Minikube starts:

sudo ip link set docker0 promisc on

(you should run this command each time you restart Minikube)

Now, let’s verify the Minikube installation:

sudo minikube status
get pod --all-namespaces

Check status and wait for all pods to become Running or Completed

Now, enable GPU support on Kubernetes

sudo kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta/nvidia-device-plugin.yml

Let’s make sure that pods can indeed consume GPU. Create a sample pod definition:

cat <<EOF > pod-gpu.yaml
apiVersion: v1
kind: Pod
metadata:
  name: cuda-vector-add
spec:
  restartPolicy: OnFailure
  containers:
    - name: cuda-vector-add
      image: "k8s.gcr.io/cuda-vector-add:v0.1"
      resources:
        limits:
          nvidia.com/gpu: 1 # requesting 1 GPU
EOF

and create a pod using this yaml:

sudo kubectl create -f pod-gpu.yaml

Wait for it to enter Completed state and verify its logs

sudo kubectl logs cuda-vector-add

Step 5: Install OpenWhisk Fork with GPU Support on Minikube using Helm

Create yaml definition under ~/mycluster.yaml

whisk:
  ingress:
    type: NodePort
    apiHostName: <MINIKUBE IP>
    apiHostPort: 31001
  limits:
    actions:
      memory:
        max: "2048m"
nginx:
  httpsNodePort: 31001

Replace ingress.apiHostName with the output of the minikube ip command.

Download helm and extract it to the home directory.

Init helm causing the tiller pod to get created:

sudo ~/linux-amd64/helm init

When the tiller pod moves to the Running state, grant the necessary privileges to helm:

sudo kubectl create clusterrolebinding tiller-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default

Label our node (we have only one)

kubectl label nodes --all openwhisk-role=invoker

Clone the OpenWhisk repository and run helm to install OpenWhisk

cd ~
git clone https://github.com/5g-media/incubator-openwhisk-deploy-kube.git
cd incubator-openwhisk-deploy-kube
git checkout gpu
~/linux-amd64/helm install ./helm/openwhisk --namespace=openwhisk --name=owdev -f ~/mycluster.yaml

Coffee time: installation can take a few minutes…

Wait for the invoker-health pod to get created and run:

sudo kubectl get pods -n openwhisk | grep invokerhealthtestaction

We are almost done. Now we need to pull the GPU runtime images for the OpenWhisk GPU actions:

sudo docker pull docker5gmedia/python3dscudaaction
sudo docker pull docker5gmedia/cuda8action

These images are large, so it might take a few more minutes.

The last step in the OpenWhisk installation is to install and configure the OpenWhisk CLI.

To install:

curl -L https://github.com/apache/incubator-openwhisk-cli/releases/download/latest/OpenWhisk_CLI-latest-linux-amd64.tgz -o /tmp/wsk.tgz
tar xvfz /tmp/wsk.tgz -C /tmp/
mv /tmp/wsk /usr/local/bin

To configure:

wsk property set --apihost <whisk.ingress.apiHostName>:<whisk.ingress.apiHostPort>
wsk property set --auth 23bc46b1-71f6-4ed5-8c54-816aa4f8c502:123zO3xZCLrMN6v2BKK1dXYFpXlPkccOFqm12CdAsMgRU4VrNZ9lyGVCGuMDGIwPcat <<EOF > ~/.wskprops
APIHOST=$OPENWHISK_APIHOST
AUTH=$OPENWHISK_AUTH
EOF

If you run into problems, for troubleshooting take a look at this more complete description.

We have everything set up. Now let the real fun begin! We will start with following the footsteps of an introductory blog by Nvidia. To save us the hassle of setting up the development environment for CUDA, we will extend a Docker development container image prepared by Nvidia. The new container already has all the dependencies packed (including OpenWhisk CLI being pre-installed and pre-configured). We will start this container in the interactive mode as follows:

sudo docker run -it -e OPENWHISK_APIHOST=`sudo minikube ip`:31001 --rm docker5gmedia/5gmedia-playbox-minikube-ow-gpu:1.0 /bin/bash

In our development container image for Minikube, we have Apache OpenWhisk CLI pre-installed and pre-configured. To check:

more ~/.wskprops

You should see something like this:

A sample content of the .wskprops in the development container (you will see a different IP, which will be the IP of your Docker host machine)

Now, let’s create the code of our CUDA action. We will use the same code as the Nvidia development blog uses to illustrate how GPU can be consumed.

cat <<EOF > /add.cu
#include <iostream>
#include <math.h>
// Kernel function to add the elements of two arrays
__global__
void add(int n, float *x, float *y)
{
  for (int i = 0; i < n; i++)
    y[i] = x[i] + y[i];
}

int main(void)
{
  int N = 1<<20;
  float *x, *y;

  // Allocate Unified Memory . accessible from CPU or GPU
  cudaMallocManaged(&x, N*sizeof(float));
  cudaMallocManaged(&y, N*sizeof(float));

  // initialize x and y arrays on the host
  for (int i = 0; i < N; i++) {
    x[i] = 1.0f;
    y[i] = 2.0f;
  }

  // Run kernel on 1M elements on the GPU
  add<<<1, 1>>>(N, x, y);

  // Wait for GPU to finish before accessing on host
  cudaDeviceSynchronize();

  // Check for errors (all values should be 3.0f)
  float maxError = 0.0f;
  for (int i = 0; i < N; i++)
    maxError = fmax(maxError, fabs(y[i]-3.0f));
  std::cout << "{\"message\": \"Max error: " << maxError << "\"}";

  // Free memory
  cudaFree(x);
  cudaFree(y);
  
  return 0;
}
EOF

Compile the example:

nvcc add.cu -o add_cuda

And run it:

./add_cuda

If you see the above, the example works as expected.

Create the OpenWhisk action:

wsk -i action create cuda_Test myAction.zip --kind cuda:8@gpu

And invoke it:

wsk -i action invoke -b cuda_Test

Since “-b” flag is used in the invocation, the invocation is blocking and should return the following result (see the fragment below):

"response": {
        "result": {
            "message": "Max error: 0"
        },
        "status": "success",
        "success": true
    },

The whole output looks something like this:

To observe the action running, we ca use sudo watch nvidia-smi

Ok. It works. But maybe running a basic CUDA example is not what you want to do? Let’s consider another use case: speech recognition. In this scenario we will use DeepSpeech.

The development container that we’ve been using in the previous scenario already has this git repository installed.

cd /incubator-openwhisk-runtime-python/core/python3DSAction/sample

We now need to create the DeepSpeech action as follows

wsk -i action create myAction-gpu ds_action.py -m 2048 --kind python:3ds@gpu

Let’s check it is created

wsk -i action list

We will use this sample sound file as the input (play it first using your favorite player, if you wish).

Now let’s see whether our DeepSpeech action will recognize what this sample is about.

wsk -i action invoke -r myAction-gpu -p url https://raw.githubusercontent.com/5g-media/incubator-openwhisk-runtime-python/gpu/core/python3DSAction/sample/audio/2830-3980-0043.wav

It is time to summarize what we’ve leaned in this blog.

Apache OpenWhisk has a flexible pluggable architecture that allows easily to add new types of actions. In our case — GPU consuming actions;
Combined with the K8s GPU scheduling support, OpenWhisk actions can readily exploit GPU — a requirement that comes up repeatedly in a variety of use case contexts;
Let us know what you think.

This work is part of the 5G-MEDIA project

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 761699