Setting up TensorFlow GPU on Google Cloud Instance with Ubuntu 16.04

I recently set up a Google Cloud instance to train some TensorFlow models on. While Amazon EC2 has AMIs that already have everything configured for you, on Google Cloud you need to set up everything yourself. I spent several days doing this, and the instructions I found online — both from Google and from other places were for older versions of TensorFlow so did not work with the newest version, which at the time of this writing is 1.7.

These instructions will work for v1.7 and have been tested several times. I hope they will help someone from having to spend days searching online to decode all of the various error messages.

This is taken from the Google instructions, but with the proper versions of CUDA that will work for TensorFlow v1.7:

Then you should verify that everything is installed and working with:

Google’s instructions do not mention installing the cudnn, but it appears to be required. To download it you need to register with Nvidia’s Developer’s Program, download it and then upload it to your instance. I uploaded it using SCP which took a while. The file I used was libcudnn7_7.0.4.31–1+cuda9.0_amd64.deb, which is the version required for cuda-9.0 with TensorFlow 1.7. Once it is uploaded you can install it with:

Once this is all installed you need to set some PATH variables. Google’s instructions add the variable to the path temporarily, so need to be run every time you boot the instance. This will add them permanently:

Finally you can install TensorFlow:

Note that I am using python3 and pip3, but if you want to use python3 just remove the “3” from the commands above. Once this is done you should be able to import tensorflow without any errors.

Finally, Google suggests a couple of settings to optimize the GPU performance:

If you are still having problems running TensorFlow code, I found the following steps which I perform although I am not sure if they are necessary. I do them anyway just because, but the problems I was having may be solved by properly exporting the path, but I don’t want to set up another instance just to check if they are required:

The procedures here were taken from this post which was the most helpful instructions I found, but did not work for the current version of TensorFlow.

--

--

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store