TL;DR: Save time and headaches by following this recipe for working with Tensorflow, Jupyter, Docker, and Nvidia GPUs on Google Cloud.
Motivation: Businesses like fast, data-driven insights, and they employ data scientists to make them. Practicing data science is an exploratory, iterative process requiring lots of computing resources and lots of time. To better support exploratory iteration, data scientists often use notebooks like Jupyter, and to accelerate computation of Tensorflow jobs they’re increasingly using to GPUs. However, GPUs are costly, and the resources need to be managed carefully because businesses also like efficient operations.
There’s currently a trend in cloud computing to use Kubernetes and Docker to improve resource utilization. Wouldn’t it be great if data science tools like Jupyter and GPUs could be managed with Docker and Kubernetes? It would enable saving time AND money. It’s possible, and I ran into several version/dependency problems before I arrived at this working configuration. Please reuse it!
Create a GCE instance
First, create firewall rules to access Jupyter (8888) and Tensorboard (6006)
Then create a GCE instance. For the instance:
- Use OS Ubuntu 16.04 LTS
- Allocate a 50GB boot disk
- Specify that you want at least one K80 GPU
- Tag with “jupyter” and “tensorboard” to apply the firewall rules you created
Install and Verify CUDA can Access the GPU
Use CUDA library from Nvidia to gain access to the GPU.
Next step is to SSH to the compute node you created, then use this script [source] to install CUDA:
You can use
wget to pull the source gist and pipe into bash:
wget -O - -q 'https://gist.githubusercontent.com/allenday/f426e0f146d86bfc3dada06eda55e123/raw/41b6d3bc8ab2dfe1e1d09135851c8f11b8dc8db3/install-cuda.sh' | sudo bash
If CUDA install is successful, running
nvidia-smi will display a table describing an available Tesla K80 GPU.
Install Docker(-Engine) and Nvidia-Docker
docker, you need the
docker-ce version from Docker, not the package
docker.io that ships with Ubuntu. Use this script derived from [source].
or just use mine:
wget -O - -q 'https://gist.githubusercontent.com/allenday/c875eaf21a2b416f6478c0a48e428f6a/raw/f7feca1acc1a992afa84f347394fd7e4bfac2599/install-docker-ce.sh' | sudo bash
nvidia-docker from a deb file [source]:
wget https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.debsudo dpkg -i nvidia-docker*.deb
Verify the GPU is Visible from a Docker Container
nvidia-docker-plugin. Must be run as root.
sudo nvidia-docker-plugin &
Now make sure the docker container can see the GPU:
sudo nvidia-docker run --rm nvidia/cuda nvidia-smi
As shown above, you’ll get the same type of table you got when running
nvidia-smi on the prompt without running inside a Docker container.
[Optional] Create a Snapshot Volume
If you followed along and ran the steps above, you may have noticed it took some time. When you’re running a GPU instance, it’s more costly. You can avoid having to repeat these steps and wasting time/money by snapshotting this working image and then booting up from it if you need a GPU enabled instance again later.
Launch Jupyter and Tensorboard
sudo nvidia-docker run --rm --name tf1 -p 8888:8888 -p 6006:6006 gcr.io/tensorflow/tensorflow:latest-gpu jupyter notebook --allow-root
If the above command shows a line like:
…you’re in business. Find the external IP address of your GCE instance and connect to it on port 8888, e.g.
http://EXTERNAL_IP:8888/, type in the (similar) token from your console, and you have a GPU enabled Jupyter notebook running Tensorflow.