EC2 + TensorFlow GPU + Docker = ❤

Enias Cailliau
Superlinear
Published in
2 min readOct 22, 2018

Here at Radix.ai, we use GPUs to accelerate our Deep Learning training. One of our goals is to remain platform neutral towards our clients. Docker is a great platform that allows us to achieve this goal. Docker is a platform which abstracts the hardware from the container in operation. Now Nvidia and Docker made it possible to pass GPU capabilities such as CUDA to a Docker instance. This tutorial provides a step-through guide on how to get GPU accelerated docker up and running.

Prepare your EC2 instance

In this tutorial, we prepare an Amazon EC2 P2 GPU instance to support nvidia-dockers.

  • Image: Deep Learning Base AMI (Ubuntu)
  • Region: eu-central-1 (EU Frankfurt)
  • Instance type: p2.xlarge
  • Storage: 50 GB (more if you will use large datasets)

First, boot up an instance with the specifications featured above. Once booted ssh into the machine using your certificate:

Once on the machine, we first need to install Docker:

Currently, Docker has no native support for GPU. Luckily Nvidia provides a nvidia-docker runtime which can be used to replace the default Docker runtime. Nvidia-docker2 can be installed using the following commands:

We can now test if the runtime is working by running a GPU accelerated docker in the nvidia runtime:

If everything is installed correctly, you should see a printout which includes the name of the GPU that is passed to the Docker instance (in our case the Tesla K80).

The docker command we used to boot up our nvidia/cuda docker selects the nvidia runtime using the runtime argument. However, we don’t want to supply this argument each time we run a docker. To avoid a bloated docker command, we modify the docker daemon to use the Nvidia runtime automatically:

That’s it! Your instance is now ready to accept docker images which include GPU support. As an example, let’s deploy a Jupiter notebook to start our deep learning development.

Running Tensorflow GPU in a jupyter notebook

First, we need to ensure the security group of our instance accepts incoming traffic on port 6006 and 8888. Then we can start a docker machine using the following command:

Now you can visit your Jupyter notebook using the public ip from your EC2 instance.

Final thoughts

In this tutorial, we used docker to deploy GPU-accelerated deep learning environments on AWS. Using these docker-machines means that you can collect all your dependencies in one place, resulting in a portable solution.

--

--