GPUs & Kubernetes for Deep Learning — Part 1/3

Samuel Cozannet
HackerNoon.com
13 min readFeb 15, 2017

--

A few weeks ago I shared a side project about building a Building a DYI GPU cluster for k8s to play with Kubernetes with a proper ROI vs. AWS g2 instances.

This was spectacularly interesting when AWS was lagging behind with old nVidia K20s cards (which are not supported anymore on the latest drivers). But with the addition of the P series (p2.xlarge, 8xlarge and 16xlarge) the new cards are K80s with 12GB RAM, outrageously more powerful than the previous ones.

Baidu just released a post on the Kubernetes blog about the PaddlePaddle setup, but they only focused on CPUs. I thought it would be interesting looking at a setup of Kubernetes on AWS adding some GPU nodes, then exercise a Deep Learning framework on it. The docs say it is possible…

This post is the first of a sequence of 3: Setup the GPU cluster (this blog), Adding Storage to a Kubernetes Cluster (right afterwards), and finally run a Deep Learning training on the cluster (working on it, coming up post MWC…).

The Plan

In this blog, we will:

  1. Deploy k8s on AWS in a development mode (no HA, colocating etcd, the control plane and PKI)
  2. Deploy 2x nodes with GPUs (p2.xlarge and p2.8xlarge instances)
  3. Deploy 3x nodes with CPU only (m4.xlarge)
  4. Validate GPU availability

Requirements

For what follows, it is important that:

  • You understand Kubernetes 101
  • You have admin credentials for AWS
  • If you followed the other posts, you know we’ll be using the Canonical Distribution of Kubernetes, hence some knowledge about Ubuntu, Juju and the rest of Canonical’s ecosystem will help.

Foreplay

  • Make sure you have Juju installed.

On Ubuntu,

for other OSes, lookup the official docs

Then to connect to the AWS cloud with your credentials, read this page

  • Finally copy this repo to have access to all the sources

OK! Let’s start GPU-izing the world!

Deploying the cluster

Boostrap

As usual start with the bootstrap sequence. Just be careful that p2 instances are only available in us-west-2, us-east-1 and eu-west-2 as well as the us-gov regions. I have experience issues running p2 instances on the EU side hence I recommend using a US region.

Deploying instances

Once the controller is ready we can start deploying services. In my previous posts, I used bundles which are shortcuts to deploy complex apps.

If you are already familiar with Juju you can run juju deploy src/k8s-gpu.yaml and jump at the end of this section. For the others interested in getting into the details, this time we will deploy manually, and go through the logic of the deployment.

Kubernetes is made of 5 individual applications: Master, Worker, Flannel (network), etcd (cluster state storage DB) and easyRSA (PKI to encrypt communication and provide x509 certs).
In Juju, each app is modeled by a charm, which is a recipe of how to deploy it.

At deployment time, you can give constraints to Juju, either very specific (instance type) or laxist (# of cores). With the later, Juju will elect the cheapest instance matching your constraints on the target cloud.

First thing is to deploy the applications:

Here you can see an interesting property in Juju that we never approached before: naming the services you deploy. We deployed the same kubernetes-worker charm twice, but twice with GPUs and the other without. This gives us a way to group instances of a certain type, at the cost of duplicating some commands.

Also note the revision numbers in the charms we deploy. Revisions are not directly tight to versions of the software they deploy. If you omit them, Juju will elect the lastest revision, like Docker would do on images.

Adding the relations & Exposing software

Now that the applications are deployed, we need to tell Juju how they are related together. For example, the Kubernetes master needs certificates to secure its API. Therefore, there is a relation between the kubernetes-master:certificates and easyrsa:client.

This relation means that once the 2 applications will be connected, some scripts will run to query the EasyRSA API to create the required certificates, then copy them in the right location on the k8s master.

These relations then create statuses in the cluster, to which charms can react.

Essentially, very high level, think Juju as a pub-sub implementation of application deployment. Every action inside or outside of the cluster posts a message to a common bus, and charms can react to these and perform additional actions, modifying the overall state… and so on and so on until equilibrium is reached.

Let’s add the relations:

Note at the end the expose commands.
These are instructions for Juju to open up firewall in the cloud for specific ports of the instances. Some are predefined in charms (Kubernetes Master API is 6443, Workers open up 80 and 443 for ingresses) but you can also force them if you need (for example, when you manually add stuff in the instances post deployment)

Adding CUDA

CUDA does not have an official charm yet (coming up very soon!!), but there is my demoware implementation which you can find on GitHub. It has been updated for this post to CUDA 8.0.61 and drivers 375.26.

Make sure you have the charm tools available, clone and build the CUDA charm:

This will create a new folder called builds in JUJU_REPOSITORY, and another called cuda in there.

Now you can deploy the charm

This will take a fair amount of time as CUDA is very long to install (CDK takes about 10min and just CUDA probably 15min).

Nevertheless, at the end the status should show:

Let us see what nvidia-smi gives us:

On the more powerful 8xlarge,

Aaaand yes!! We have our 8 GPUs as expected so 8x 12GB = 96GB Video RAM!

At this stage, we only have them enabled on the hosts. Now let us add GPU support in Kubernetes.

Adding GPU support in Kubernetes

By default, CDK will not activate GPUs when starting the API server and the Kubelets. We need to do that manually (for now).

Master Update

On the master node, update /etc/default/kube-apiserver to add:

before restarting the API Server. This can be done programmatically with:

So now the Kube API will accept requests to run privileged containers, which are required for GPU workloads.

Worker nodes

On every worker, /etc/default/kubelet to to add the GPU tag, so it looks like:

before restarting the service.

This can be done with

Testing our setup

Now we want to know if the cluster actually has GPU enabled. To validate, run a job with an nvidia-smi pod:

Then wait a little bit and run the log command:

Ẁhat is intersting here is that the pod sees all the cards, even if we only shared the /dev/nvidia0 char device. At runtime, we would have problems.
If you want to run multi GPU containers, you need to share all char devices like we do in the second yaml file (nvidia-smi-8.yaml)

Conclusion

We reached the first milestone of our 3 part journey: the cluster is up & running, GPUs are activated, and Kubernetes will now welcome GPU workloads.

If you are a data scientist or running Kubernetes workloads that could benefit of GPUs, this already gives you an elegant and very fast way of managing your setups. But usually in this context, you also need to have storage available between the instances, whether it is to share the dataset or to exchange results.

Kubernetes offers many options to connect storage. In the second part of the blog, we will see how to automate adding EFS storage to our instances, then put it to good use with some datasets!

In the meantime, feel free to contact me if you have a specific use case in the cloud for this to discuss operational details. I would be happy to help you setup you own GPU cluster and get you started for the science!

Tearing Down

Whenever you feel like it, you can tear down this cluster. These instances can be pricey, hence powering them down when you do not use them is not a bad idea.

This will ask for confirmation then destroy everything… But now, you are just a few commands and a coffee away from rebuilding it, so that is not a problem.

Hacker Noon is how hackers start their afternoons. We’re a part of the @AMIfamily. We are now accepting submissions and happy to discuss advertising &sponsorship opportunities.

To learn more, read our about page, like/message us on Facebook, or simply, tweet/DM @HackerNoon.

If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!

--

--