Warning before reading this: I am very happy this blog has been useful to lots of people. Since its now over a year old some of the commands are based on older versions of software. For an easier setup you may now want to use the Google Cloud optimized compute images:
I recommend reading this blog by Viacheslav Kovalevskyi instead of continuing with this one.
This blog was written for:
- Ubuntu 16.04
- CUDA 8
- TensorFlow 1.4.0
Google has some nice documentation here but there were a few additional steps I needed to take. So, lets start from the beginning:
There are two ways to set the instance up: 1) you use the command-line interface that Google Cloud offers or 2) you use their incredibly friendly web-ui to help you along. Since I am a big fan of the Google Cloud web interface this is what I’ll do. Setting up a server is very simple.
When in the Google Cloud platform make sure you have created a project and then navigate to the Compute Engine. There you will be asked if you want to create a new instance and once you get the popup dialog shown here on the left you can configure the number of cores, the memory (RAM) and a little option saying “GPU”. Click on this and the additional options show up that will allow you to indicate if and how many GPU’s you want to use. I then selected Ubuntu 16.04 as a boot disk, left all the other options the same and then click Create to start the instance.
Once the instance is ready you can connect to it by either using the web-shell Google Cloud offers or by copying the gcloud command to connect from your own command line.
Now that we are in we need to install some drivers for the GPU. The GPUs on Google Cloud are all NVIDIA cards and those need to have CUDA installed. To install CUDA 8.0 I used the following commands for Ubuntu 16.04 (taken from the Google Cloud documentation):
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! dpkg-query -W cuda; then
# The 16.04 installer works with 16.10.
curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
dpkg -i ./cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
# apt-get install cuda -y
sudo apt-get install cuda-8-0
To verify its all working properly run the command below which will show you that the GPU is recognized and setup properly.
Yay! It exists and is being recognized. We’ll also need to set some environment variables for CUDA:
echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc
echo 'export PATH=$PATH:$CUDA_HOME/bin' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=$CUDA_HOME/lib64' >> ~/.bashrcsource ~/.bashrc
NVIDIA provides the cuDNN library to optimize neural network calculations on their cards. They describe it as:
The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. cuDNN is part of the NVIDIA Deep Learning SDK.
Deep learning researchers and framework developers worldwide rely on cuDNN for high-performance GPU acceleration. It allows them to focus on training neural networks and developing software applications rather than spending time on low-level GPU performance tuning.
Summary: they have done a lot of work to make your life easier… You will need to register for the NVIDIA Developer Program and then you can download the latest version of the software. In this case I downloaded version 5.1 for CUDA 8.0 (I just noticed a newer version 6.0 is available as well). Once downloaded move it over to the instance using SCP or via Google Cloud Storage.
Once its on the instance install it using:
cd $HOMEtar xzvf cudnn-8.0-linux-x64-v5.1.tgzsudo cp cuda/lib64/* /usr/local/cuda/lib64/
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/rm -rf ~/cuda
So the GPU instance is running and the drivers are in place, all that is left is to get TensorFlow installed to function for the GPU. You can see Google is trying to make it super simple to get all this working because you literally need to lines to get this last step done:
sudo apt-get install python-dev python-pip libcupti-dev
sudo pip install --upgrade tensorflow-gpu==1.4.0
Installing tensorflow-gpu ensures that it defaults to the GPU for the operations where its needed. You can still manually move certain things to the CPU whenever you want to. Lets test if it all works…
Testing the setup
Now to test if it was all successful you can use the python code below. It assigns two variables and one operation to the cpu and another two variables and an operation to the GPU. When starting the session we are telling it via the ConfigProto to log the placement of the variables/operations and you should see it printing out on the command line where they are placed.
import tensorflow as tf
a_c = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a-cpu')
b_c = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b-cpu')
c_c = tf.matmul(a_c, b_c, name='c-cpu')
a_g = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a-gpu')
b_g = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b-gpu')
c_g = tf.matmul(a_g, b_g, name='c-gpu')
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
*** Update (March 7, 2018)
based on feedback from ChrisAMancuso I have replaced the line
apt-get install cuda -y
sudo apt-get install cuda-8–0
to ensure that it installs CUDA 8.0 (CUDA 9.0 is/was not supported by TensorFlow).