GPU + Azure + Deep Learning with minimum pain

Vitaly Davydov
Poteha Labs
Published in
6 min readNov 21, 2017

This tutorial presents how to run a GPU machine with Docker for Deep Learning on Azure

If you ever faced a Deep Learning problem you have probably already heard about the GPU (graphics processing unit). It’s mostly used for computational graphics (rendering) but not limited to. When NVidia first released CUDNN, a high level API (called CUDNN) for it’s cards, Deep Learning training speed dramatically increased and the Deep Learning Industry saw considerable growth!

What’s the difference between a GPU and CPU?

CPU — has a few cores but each is very powerful. Ideal for serial processing.

GPU — has a thousands of cores but each core is weak. Ideal for parallelize something that has a lot of simple operations. For example, the dot product between two vectors. That’s why they are extensively used in Deep Learning to train neural networks.

NVidia Tesla K80 — dream of every deep learner

Let’s say, you are given a task of image processing, e.g. classification. What will you do? Most likely your answer would be “I’ll fit some convolutional neural architecture and this will work”. Most likely you are right, but even if you know and confident about your neural network architecture with CPU you will train it forever. So you need a GPU. Powerful GPU like Tesla K80 above.

Now, you can either buy it for ~5k$ which seems not an attractive option, especially if you don’t need it very often and you don’t have a laboratory with GPUs. If you don’t want/need to buy a GPU then your choice is cloud providers where you can rent GPUs for some amount of time.

Azure provides GPU instances for a fairly good price. Renting a machine with one K80 will be about £600 (around 800$). Therefore, renting a virtual machine makes sense if you don’t plan to use if more than half a year continuously.

So you chose a NC6 virtual machine with one K80 and Ubuntu but what’s next? If you think that now you can run any Deep Learning framework like Tensorflow easily think again. The biggest challenge is to get the GPU to work for you.

To run tasks on GPU you need to install CUDNN on your machine. CUDNN is a low level API for your card made by NVidia.

Installing Deep Learning environment on Ubuntu is a HUGE pain

That’s why I started to look for alternatives. The best thing I’ve found was Docker. Docker is not covered in this tutorial (but well covered here) but you can imagine Docker as a container service. Container is kind of virtual machine (BUT IT’S NOT) that can completely isolate your process and it runs in the same way on EVERY machine architecture. For example, you can make local experiments on you local machine with Docker and then deploy it to any machine with one command. Containers share your system kernel, VM use separate kernel. That’s why Containers are more lightweight.

NVidia made it’s own subversion of Docker that has CUDNN drivers preinstalled. This is very cool and available on Github for free!

Let’s see how to install nvidia-docker step by step on new NC6 Ubuntu powered machine.

Creating a new VM with GPU

Choose a preferable type of VM based on specs and price using from the Microsoft page. It’s recommended to start with a cheapest solution — NC6 instance. Here is a step by step process how to create a VM.

The process is super simple

  1. Hit “Add” to start creation process.
  2. Choose Ubuntu Server and Ubuntu 16.04 version (you can choose any other version but this tutorial optimised for 16.04).
  3. Set “Disk type” as a HDD (very important) and fill other required parameters. It’s actually recommended to use a public key from your computer rather than a password.
  4. Choose NC6 type
  5. Configure additional network params (actually not needed for now) and hit create.

After couple of minutes your new VM will be created!

If you you want to use Jupyter Notebook you need to open a new port to inbound connections. To do this go to Virtual Machines, choose a needed VM. Then hit “Networking” -> “Add inbound port rule” and enter port or just out a * for all ports.

Adding inbound rule for Azure VM

Install Docker

Copy-paste from Docker website.

$ sudo apt-get update$ sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -$ sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
$ sudo apt-get update$ sudo apt-get install docker-ce$ sudo docker run hello-world

Install NVidia drivers

Installing CUDA drivers for Ubuntu 16.04 for NVidia Tesla K80. If you have a different GPU / OS please go to official website and found your driver.

This installation takes some time.

# From NVIDIA website$ wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.44-1_amd64.deb$ sudo dpkg -i cuda-repo-ubuntu1604_8.0.44-1_amd64.deb$ sudo apt-get update$ sudo apt-get install cuda

Installing nvidia-docker

# Install nvidia-docker and nvidia-docker-plugin
$ wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb

# Test nvidia-smi
nvidia-docker run --rm nvidia/cuda nvidia-smi

Good! Now you are ready to run Jupyter Notebook with a command

sudo nvidia-docker run --rm --name tf_image -d \
-p 8888:8888 -p 6006:6006 -e PASSWORD=MyAwesomePassword \
-v /home/iwitaly/Documents:/notebooks gcr.io/tensorflow/tensorflow:latest-gpu jupyter notebook --allow-root

Let’s see what’s going on here.

nvidia-docker run

runs new container from the image gcr.io/tensorflow/tensorflow:latest-gpu with a command jupyter notebook — allow-root.

--rm — remove container after run
--name — give a name to a container
-d — run in detached (background) mode
-p — bind container and local port
-e — environment variables that are accessible inside by your container
-v — mount local volume to a container

find more description about run command on official Docker documentation.

--

--