Jupyter + Tensorflow + Nvidia GPU + Docker + Google Compute Engine

TL;DR: Save time and headaches by following this recipe for working with Tensorflow, Jupyter, Docker, and Nvidia GPUs on Google Cloud.

Motivation: Businesses like fast, data-driven insights, and they employ data scientists to make them. Practicing data science is an exploratory, iterative process requiring lots of computing resources and lots of time. To better support exploratory iteration, data scientists often use notebooks like Jupyter, and to accelerate computation of Tensorflow jobs they’re increasingly using to GPUs. However, GPUs are costly, and the resources need to be managed carefully because businesses also like efficient operations.

There’s currently a trend in cloud computing to use Kubernetes and Docker to improve resource utilization. Wouldn’t it be great if data science tools like Jupyter and GPUs could be managed with Docker and Kubernetes? It would enable saving time AND money. It’s possible, and I ran into several version/dependency problems before I arrived at this working configuration. Please reuse it!

Create a GCE instance

First, create firewall rules to access Jupyter (8888) and Tensorboard (6006)

Then create a GCE instance. For the instance:

  • Use OS Ubuntu 16.04 LTS
  • Allocate a 50GB boot disk
  • Specify that you want at least one K80 GPU
  • Tag with “jupyter” and “tensorboard” to apply the firewall rules you created

Install and Verify CUDA can Access the GPU

Use CUDA library from Nvidia to gain access to the GPU.

Next step is to SSH to the compute node you created, then use this script [source] to install CUDA:

You can use wget to pull the source gist and pipe into bash:

wget -O - -q 'https://gist.githubusercontent.com/allenday/f426e0f146d86bfc3dada06eda55e123/raw/41b6d3bc8ab2dfe1e1d09135851c8f11b8dc8db3/install-cuda.sh' | sudo bash

If CUDA install is successful, running nvidia-smi will display a table describing an available Tesla K80 GPU.

nvidia-smi

Install Docker(-Engine) and Nvidia-Docker

For docker, you need the docker-ce version from Docker, not the package docker.io that ships with Ubuntu. Use this script derived from [source].

or just use mine:

wget -O - -q 'https://gist.githubusercontent.com/allenday/c875eaf21a2b416f6478c0a48e428f6a/raw/f7feca1acc1a992afa84f347394fd7e4bfac2599/install-docker-ce.sh' | sudo bash

Then install nvidia-docker from a deb file [source]:

wget https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
sudo dpkg -i nvidia-docker*.deb

Verify the GPU is Visible from a Docker Container

Start nvidia-docker-plugin. Must be run as root.

sudo nvidia-docker-plugin &

Now make sure the docker container can see the GPU:

sudo nvidia-docker run --rm nvidia/cuda nvidia-smi

As shown above, you’ll get the same type of table you got when running nvidia-smi on the prompt without running inside a Docker container.

[Optional] Create a Snapshot Volume

If you followed along and ran the steps above, you may have noticed it took some time. When you’re running a GPU instance, it’s more costly. You can avoid having to repeat these steps and wasting time/money by snapshotting this working image and then booting up from it if you need a GPU enabled instance again later.

Launch Jupyter and Tensorboard

sudo nvidia-docker run --rm --name tf1 -p 8888:8888 -p 6006:6006 gcr.io/tensorflow/tensorflow:latest-gpu jupyter notebook --allow-root

If the above command shows a line like:

http://localhost:8888/?token=c8caba947dfd4c97414447c074325faf399cf8a157d0ce2f

…you’re in business. Find the external IP address of your GCE instance and connect to it on port 8888, e.g. http://EXTERNAL_IP:8888/, type in the (similar) token from your console, and you have a GPU enabled Jupyter notebook running Tensorflow.

See also