Setting up TensorFlow GPU on Google Cloud Instance with Ubuntu 16.04

Eric Antoine Scuccimarra
Google Cloud - Community
3 min readApr 1, 2018

I recently set up a Google Cloud instance to train some TensorFlow models on. While Amazon EC2 has AMIs that already have everything configured for you, on Google Cloud you need to set up everything yourself. I spent several days doing this, and the instructions I found online — both from Google and from other places were for older versions of TensorFlow so did not work with the newest version, which at the time of this writing is 1.7.

These instructions will work for v1.7 and have been tested several times. I hope they will help someone from having to spend days searching online to decode all of the various error messages.

This is taken from the Google instructions, but with the proper versions of CUDA that will work for TensorFlow v1.7:

curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda-9-0
sudo nvidia-smi -pm 1

Then you should verify that everything is installed and working with:

nvidia-smi

Google’s instructions do not mention installing the cudnn, but it appears to be required. To download it you need to register with Nvidia’s Developer’s Program, download it and then upload it to your instance. I uploaded it using SCP which took a while. The file I used was libcudnn7_7.0.4.31–1+cuda9.0_amd64.deb, which is the version required for cuda-9.0 with TensorFlow 1.7. Once it is uploaded you can install it with:

sudo dpkg -i libcudnn7_7.0.4.31-1+cuda9.0_amd64.deb

Once this is all installed you need to set some PATH variables. Google’s instructions add the variable to the path temporarily, so need to be run every time you boot the instance. This will add them permanently:

echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc
echo 'export PATH=$PATH:$CUDA_HOME/bin' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

Finally you can install TensorFlow:

sudo apt-get install python3-dev python3-pip libcupti-dev
sudo pip3 install tensorflow-gpu

Note that I am using python3 and pip3, but if you want to use python3 just remove the “3” from the commands above. Once this is done you should be able to import tensorflow without any errors.

Finally, Google suggests a couple of settings to optimize the GPU performance:

# this applies to all GPUs
sudo nvidia-smi -pm 1
# these only apply to Nvidia Tesla K80s
sudo nvidia-smi -ac 2505,875
sudo nvidia-smi --auto-boost-default=DISABLED

If you are still having problems running TensorFlow code, I found the following steps which I perform although I am not sure if they are necessary. I do them anyway just because, but the problems I was having may be solved by properly exporting the path, but I don’t want to set up another instance just to check if they are required:

  1. cd /usr/local/cuda
  2. sudo ln -s /usr/lib/x86_64-linux-gnu/ lib64
  3. sudo ln -s /usr/include/ include
  4. sudo ln -s /usr/bin/ bin
  5. sudo ln -s /usr/lib/x86_64-linux-gnu/ nvvm
  6. sudo mkdir -p extras/CUPTI
  7. cd extras/CUPTI
  8. sudo ln -s /usr/lib/x86_64-linux-gnu/ lib64
  9. sudo ln -s /usr/include/ include

The procedures here were taken from this post which was the most helpful instructions I found, but did not work for the current version of TensorFlow.

--

--