Install TensorFlow GPU on Ubuntu 18.04 LTS

GPU accelerated deep learning with NVIDIA CUDA/cuDNN

Fan Yeng-Loon

Published in

Analytics Vidhya

10 min readJan 29, 2020

TL;DR

How I install compatible NVIDIA CUDA Tookit and cuDNN packages on Ubuntu 18.04 LTS, so that I can speed up Deep Learning with TensorFlow/Keras using GPU on my laptop.

Disclaimer
Introduction
Installation Process
Step 1. Identify Compatibility
Step 2. Install Video Driver
Step 3. Install CUDA Toolkit
Step 4. Install CUDNN Library
Step 5. Update Environment Variables
Step 6. Install TensorFlow-GPU and Keras
Step 7. Run Verification Tests
Step 8. Trial run on a CNN Model
Miscellaneous Tips
a. What to do if Tensorflow-GPU still does not load?
b. How to Manage CUDA Out-of-memory Warnings?
c. How to Monitor GPU Usage?
d. How to Kill (Release) Runaway Processes?
e. How I got Keras CuDNNLSTM to work with the latest spaCy?
Wrap Up
Handy References

Disclaimer

Before we begin, some caveats are in order.

This article is not written for Anaconda users, because when you install Tensorflow-gpu under Anaconda, it should take care of said CUDA drivers for you. You can check it out here and here.
This is primarily for Python users who code with an IDE and virtual environments outside of Anaconda.
Also, this is based on my experience with Ubuntu 18.04 LTS only. However, I suppose it is possible that the overall installation process can be easily adapted for Ubuntu 16.04 LTS too.
Sadly, there is no guarantee that your setup will definitely work, even if you have successfully installed and tested with compatible CUDA/cuDNN packages. Additional tweaks and experiments might be required.

Introduction

Before I joined the Metis Data Science boot camp journey, one of the suggested prerequisites was to prepare a Ubuntu laptop. Naturally, I went for it without hesitation. Installing Ubuntu on my laptop to dual-boot with Windows was a fabulous learning experience worthy of its own article.

On top of that, I wanted my DL projects to make good use of my laptop GPU (GTX 1050). As it turned out, I spent many hours googling and reinstalling of library packages. Hence, I have decided to blog this for my own reference and hopefully it will help you too if you are also interested.

Prerequisites

You must have an NVIDIA graphics card installed on your machine. You can find out from the command line:

$ lspci | grep -i nvidia

You have installed Ubuntu 18.04 LTS (my references: here, here & here)
You have set up a Python development environment for machine learning

Installation Process

Before we begin, it is always good to know how to fall back to previous state should something go wrong. Fortunately, it is quite simple to uninstall as detailed here:

To remove CUDA Toolkit:
$ sudo apt-get --purge remove "*cublas*" "cuda*"

To remove libcudnn drivers:
$ sudo apt-get --purge remove "libcudnn7*"

To remove NVIDIA drivers (optional):
$ sudo apt-get --purge remove "*nvidia*"

Step 1 — Identify Compatibility

We need to ensure that your system has no compatibility issues with these:

Video driver
CUDA Toolkit
cuDNN
Tensorflow GPU

First, find out whether your NVIDIA video card is compatible here.

Next, take note on what is the graphics card driver version using one of these:

$ modinfo nvidia | grep version
$ nvidia-settings

Visit here to identify which version of CUDA is compatible with your card.

Table 1: Cuda Toolkit and Compatible Video Driver

Finally, visit here to take note of which cuDNN and TensorFlow you can install based on your most compatible CUDA version.

Table 2: Tensorflow GPU and respective compatible libraries

From my personal experience, although we should install only the compatible CUDA and respective cuDNN versions, we have a little room to vary the cuDNN versions depending on the default installation outcome. More on this later.

Step 2 — Install Video Driver

Now that you know you have a compatible NVIDIA video card, let’s install the latest driver.

Firstly, let’s compare what you have with what is available:

$ modinfo nvidia | grep version
$ ubuntu-drivers devices

Let’s install the latest recommended driver from the driver list, if you have not done so. For my case, I updated my GTX 1050 to version 440 using this:

$ sudo apt install nvidia-driver-440

Always ensure you have selected NVIDIA driver before rebooting.

$ prime-select query
$ prime-select nvidia

Step 3 — Install CUDA Toolkit

With the latest video driver in place, let’s install CUDA Tookit that is most compatible with your video driver.

Let’s see whether you have it installed, or worse, the wrong version. To check, use either of these commands:

$ nvcc --version
$ apt list --installed | grep -i nvcc

If you have an incompatible version installed and/or it has not been working out for you, it’s advisable to perform a clean uninstall:

$ sudo apt-get — purge remove “*cublas*”
$ sudo apt-get — purge remove “cuda*”

If you have the correct version installed, you can skip to the next step then.

To download your compatible CUDA Toolkit, visit here and select the compatible version that you can use. We do not need to go here to install the latest CUDA because Tensorflow GPU libraries are usually not yet compatible with it (refer to Table 2 above for a recap).

Once you have clicked on the respective version, you will be sent to a new page. From here, select all the relevant options for Ubuntu 18.04. Below is the screenshot of my selection for downloading CUDA Tookit 10.0, which is the compatible version for me to install TensorFlow GPU 1.13 and 1.14.

Click on the ‘Download’ button to download the ‘Base Installer’ deb file and follow the instructions within the box, except the last instruction.

IMPORTANT: Do not upgrade your installed CUDA version after that.

Step 4 — Install cuDNN Library

Now that we have CUDA installed, let’s install the compatible cuDNN. As always, let’s check if you already have one installed and if it does not work for you, then you will need to remove it.
$ apt list --installed | grep -i libcudnn

The versioning has two parts to it. For my case, I needed cuDNN version 7.6 and CUDA Toolkit version 10.0. Hence, I downloaded and installed exactly that.

If you need to remove the existing libcudnn:
$ sudo apt-get --purge remove “libcudnn7*”

Visit here and click on “Download cuDNN” to proceed.

You will then be prompted to either signup or login. You will have to signup if this is the first time you are here.

Once login, tick to agree to the terms and you should see this page.

If your cuDNN version is not on the list, then select “Archived cuDNN Releases” for more options.

Remember to match both cuDNN and CUDA versions that are compatible for the TensorFlow you can/want to install. For my case, I chose cuDNN 7.6.3 with CUDA 10.0.

Download all three Deb files for Ubuntu 18.04:

cuDNN Runtime Library
cuDNN Developer Library
cuDNN Code Samples and User Guide

To install, run these commands (but change the filenames to the ones you have downloaded):

$ sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.0_amd64.deb
$ sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.0_amd64.deb
$ sudo dpkg -i libcudnn7-doc_7.6.5.32-1+cuda10.0_amd64.deb

Let’s also hold the version from being accidentally/automatically updated:

$ sudo apt-mark hold libcudnn7 libcudnn7-dev
$ apt-mark showhold

Step 5 — Update Environment Variables

Almost there! We’ll have to update a couple of environment variables. First, check if CUDA is already in your system path:

$ echo $PATH
$ echo $LD_LIBRARY_PATH

To update, use your favorite text editor to open the .bashrc file:
$ nano ~/.bashrc

Jump to the end of the file and append the following three lines, but make sure your change the CUDA version to match yours.


# NVIDIA CUDA Toolkit
export PATH=/usr/local/cuda-10.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH

Reboot to put these into effect.

NB: It is possible to execute the modifications without a reboot by running one of the commands below, but somehow it did not work for me when I subsequently ran the verification tests. If your tests fail the first time too, before you attempt anything else: Try rebooting once and running the tests again just to be sure.

$ source ~/.bashrc
$ . ~/.bashrc

Step 6 — Install Tensorflow GPU and Keras

Let’s install TensorFlow GPU and Keras if you have not done so yet. The TF version you need must be compatible with your CUDA Toolkit and cuDNN libraries (refer to Table 2).

For my case, my video card was compatible for CUDA 10.1 and below. This basically means that I can pretty much install any version of TensorFlow GPU. However, as my DL projects still run on older TF (TensorFlow) versions, I’ll only be installing the latest TF 1.14.

To install a specific TF version, run this:

$ pip install “tensorflow-gpu==1.14.*”
$ pip install keras

To check:

$ pip list | grep -i tensor
$ pip list | grep -i keras

Step 7 — Run Verification Tests

We are getting so close now! Let’s run the following tests in this order:

Step 7a. Verifying cuDNN is working
Step 7b. Verifying TensorFlow GPU is working

Step 7a — Verify cuDNN

Run below commands as instructed here.

$ cp -r /usr/src/cudnn_samples_v7/ $HOME
$ cd  $HOME/cudnn_samples_v7/mnistCUDNN
$ make clean && make
$ ./mnistCUDNN

If cuDNN is working correctly, you should be seeing this message:
Test passed!

Step 7b — Verify Tensorflow GPU

Launch ‘python’ from the command line and execute these codes:

$ python
>>> import tensorflow as tf
>>> print(tf.test.gpu_device_name())
>>> quit()

If TF-GPU is correctly installed, it should be reporting a GPU device:
/device:GPU:0

Step 8. Trial run on a CNN model

Visit here, download the notebook and give it a spin.

If you run into CUDA OOM (out-of-memory) warnings, click here on how to resolve it.

Miscellaneous Tips

Now that you have successfully installed a working TensorFlow GPU, let me share with you a couple of tips based on my own learning experience.

What to do if Tensorflow-GPU still does not load?

Ensure that the most compatible versions have been installed for the following: NVIDIA video driver, CUDA Toolkit, cuDNN and TensorFlow-GPU.
Ensure that the PATH and LD_LIBRARY_PATH have been correctly updated.
Reboot the machine and run the verification tests again.
Based on various forum discussions, if you are not able to use Tensorflow-GPU with the current set up, you can try downgrading to different versions of Tensorflow-GPU, cuDNN and CUDA Toolkit (in that order). Run the verification tests after each permutation change.
Google the error messages or post in forums like StackOverflow for assistance.

IMPORTANT: Always remember to uninstall the previous CUDA/cuDNN versions before installing a different one. Remember to update the environment variables too.

How to Manage CUDA Out-of-memory Warnings

Please beware that one of the common issues you might run into would be CUDA OOM (out-of-memory) warnings. Based on this and this, you can insert below codes into your script/notebook to help manage the issue.

How to Monitor GPU Usage

Run either of this in your command line:

$ nvidia-smi
$ watch -n 5 nvidia-smi

You would want to monitor the temperature, current memory usage and memory usage per process (and respective PIDs).

NB: Notice that the CUDA Version is listed as 10.2, which is different from my installed 10.0. According to this and this, there are two different APIs, the runtime and the driver API. Our installed CUDA Toolkit refers to the runtime API, not the video driver API.

How to Kill (Release) Runaway Processes

If the GPU memory is not released, even after you have exited your IDE or notebook, then you can kill it manually instead. You do so by using its respective PID. For example, if I wanted to kill the last process 2471 shown above:

$ sudo kill -9 2471

If you are unsure which process to kill, you can start from the bottom up, especially with the process that is suspiciously hogging up a lot of memory.

How I got Keras CuDNNLSTM to work with the latest spaCy

By installing only the following versions, after figuring it out how Google Colab did it:

CUDA Toolkit 10.0
cuDNN 7.6
TensorFlow 1.15.x

Note 1: While retaining the compatible version of CUDA Toolkit 10.0, I was able to upgrade cuDNN from 7.4 to 7.6, even though this was not listed as compatible on the website.

Note 2: After having installed TF 1.15, I was having problems using Keras Convolutional Layers for my image classification projects. Eventually, it was resolved by installing TensorFlow 1.14.x to a new environment. Virtual Environment for the win!

Wrap Up

Finally, I hope you are able to successfully install a working Tensorflow-GPU environment on your Ubuntu 18.04 LTS, that can fully accelerate all of your Deep Learning projects with your onboard GPU.

For those who cannot do so for whatever reasons, do not despair. You can always use cloud solutions, for eg., Google Colab, which offers both GPU and TPU for free! Alternatively, you can also try out Docker instead.

Thank you for reading my article and I hope this has been useful for you!