Containerized Jupyter notebooks on GPU on Google Cloud

In a previous post, I listed out the steps to run Jupyter notebooks on GPU instances on GCP Compute Engine. It turns out, there is a much easier and more flexible way. Using Docker containers.

I am assuming your Google Cloud Platform account allows you to create GPU based instances. If not, please follow step 1 from the previous post. Also, make sure you have the latest gcloud SDK.

$ gcloud components update && gcloud components install beta

This post on NVIDIA’s blog explains how this setup works. The GPU instances only need the NVIDIA drivers to be installed on the host (and a thin wrapper around Docker called nvidia-docker). All other software, like CUDA toolkit, cuDNN, Python, Jupyter and any deep learning libraries, could be simply containerized into reusable Docker images. NVIDIA and authors of most deep-learning frameworks (like TensorFlow, Keras, PyTorch), provide ready-to-use Docker images that you can use directly or as a base image.

Step 1: Create GPU instance

You can, of course, use Cloud Console to create a GPU based instance. But, I am going to use the gcloud command line tool.

$ gcloud beta compute instances create gpu-docker-host --machine-type n1-standard-2 --zone us-east1-d --accelerator type=nvidia-tesla-k80,count=1 --image-family ubuntu-1604-lts --image-project ubuntu-os-cloud --boot-disk-size 50GB --maintenance-policy TERMINATE --restart-on-failure 

This creates an instance named gpu-docker-host in us-east1-d zone with 1 GPU and Ubuntu 16.04 (persistent disk size 50GB).

Once your GPU instance is ready, you can connect to it via your ssh client, or gcloud compute ssh gpu-docker-host --zone us-east1-d command.

Step 2: Install NVIDIA driver, docker and nvidia-docker

Once on the server, download this script to install the dependencies:

$ curl -O -s https://gist.githubusercontent.com/durgeshm/b149e7baec4d4508eb4b2914d63018c7/raw/798aadbb54b451abcaba9bfeb833327fa4b3d53b/deps_nvidia_docker.sh

The script automates the following tasks (always a good idea to take a look, instead of just running a stranger’s script) :

  1. Confirm that the instance has a GPU from NVIDIA, otherwise exit.
  2. Check and install NVIDIA driver if necessary (for Tesla K80).
  3. Check and install docker if necessary.
  4. Check and install nvidia-docker if necessary.
$ sudo sh deps_nvidia_docker.sh
Note: You can install these dependencies while creating the instance in one single step, if you download the script locally and pass it as a startup script to ‘gcloud’ command, by appending --metadata-from-file startup-script=deps_nvidia_docker.sh . But, I have separated out the steps here.

Step 3: Ready to run any CUDA enabled docker container !

Once the script finishes installing nvidia-docker, we are ready to run a simple test container from NVIDIA.

$ sudo nvidia-docker run --rm nvidia/cuda nvidia-smi

If you see GPU and driver information in the console, then your setup is ready. (Obviously, the first time you run this, it will take few seconds to pull the nvidia/cuda image from Docker hub).

Now, let’s try a TensorFlow/Keras/Jupyter docker container (Dockerfile).

$ mkdir notebooks # to persist notebooks on the host
$ sudo nvidia-docker run -it --rm -d -v $(pwd)/notebooks:/notebooks -p 8888:8888 --name keras durgeshm/jupyter-keras-gpu

Check the container log to confirm that Jupyter is running:

~$ sudo docker logs keras

To make it easier to start the container in the future, I have also added a script run-keras.sh

~$ echo 'sudo nvidia-docker run -it --rm -d -v $(pwd)/notebooks:/notebooks -p 8888:8888 --name keras durgeshm/jupyter-keras-gpu' > run-keras.sh && chmod u+x run-keras.sh

Step 4: SSH tunnel forwarding

Set up a tunnel from your local machine to access Jupyter over ssh.

If you have already started the keras container on the server, then run the following on your local machine.

$ ssh -i .ssh/ubuntu_gcp -L 8899:localhost:8888 -f -N ubuntu@<gpu-docker-host>

I have defined a handy alias for myself to start the keras container remotely and open the tunnel immediately from my local machine.

$ alias tf-gpu="ssh gpu-docker-host './run-keras.sh' && ssh -fNL 8899:localhost:8888 gpu-docker-host"
$ tf-gpu
66357b10b6b4ec70e53273dc98878f1525f62fa6e6b1ee7d69995486f28bad1e
# ^ that is the container id that was just started.

Step 5: Start using Jupyter locally in your browser

Navigate to http://localhost:8899/ and create a new notebook. Verify by importing keras or tensorflow. ssh gpu-docker-host "sudo docker logs keras" can confirm if CUDA libraries are being loaded.

ssh gpu-docker-host “sudo docker logs keras”

Once you are done, please remember to stop your instance to save costs. Thanks for reading.

Notes:

  1. I have created Docker images labeled durgeshm/jupyter-keras-gpu (Dockerfile) and durgeshm/jupyter-pytorch-gpu (Dockerfile) for Keras/TensorFlow and PyTorch respectively.
  2. In addition, you can always use https://github.com/fchollet/keras/blob/master/docker/Dockerfile, https://github.com/pytorch/pytorch/blob/master/Dockerfile or https://hub.docker.com/r/tensorflow/tensorflow/ for official docker images as a base image for your own customizations.
  3. Note that the /notebooks volume in the container is mounted from ~/notebooks on the host. This way, you can always remove old containers, but your notebooks will be persisted on the host.
  4. Only Step 1 is specific to Google Cloud Platform. Other steps should work on other cloud platforms (with GPU and specifically Tesla K80)