Installing TensorFlow GPU & Enabling CUDA in Ubuntu 18.04— Complete Guide

Published in

Nerd For Tech

6 min readMay 24, 2021

Image Source — https://software.intel.com/content/www/us/en/develop/tools/frameworks.html

These days most of the research level machine learning algorithms are coded to be run on CUDA enabled GPUs due to the clear advantage at processing the networks at greater speeds, especially when it comes to ‘Computer Vision’ problems. Before moving into coding and running the benchmarks using TensorFlow, we need to setup the environment to use the GPU in processing our networks. This truly can be a tedious process for the majority and I myself had to try this 2–3 times in both Windows and my Ubuntu partitions. We all can agree that the NVIDIA documentation is not really helpful at this case because of the generalization and complex nature of writing.

Important:

I have mentioned the possible errors that can appear in your terminal when you run the following commands and how to correct them alongside each of those commands. If you run into errors that are not addressed in this article, please comment them below and I will try to help you to the best I can. Stack-overflow can also be really useful in such situations.
I have demonstrated this using “Anaconda” and I have installed the packages by creating a separate environment for Tensorflow-GPU version. This can come in handy if you ever feel like removing the Tensorflow-GPU installation or if you run into errors which seem unsolvable, you can just remove the environment and start again by creating a new environment without removing the whole anaconda installation altogether.
At last, I have given you example codes to run which you can use to check whether the installation was successful. Trying them in a ‘jupyter notebook’ is recommended.

Step 01 : System Check

(i) Make sure your system is CUDA capable

Visit the following sites and make sure your GPU is CUDA capable. https://developer.nvidia.com/cuda-gpus

But this list of CUDA Enabled GPUs seems to be incomplete because the GPU that I am using right now is CUDA enabled and it is not listed in the list given above for some reason!
Therefore, please check thoroughly before moving ahead with this process.
Googling “Is ___ CUDA enabled?” with your GPU version in the blanks might give you the answer you want.
Or else, visit the official promotional page of your GPU by NVIDIA and see whether it is listed as CUDA enabled.

(ii) Identify which CUDA/ cuDNN packages are compatible with your GPU

Different GPUs support different versions of CUDA and if your GPU is relatively old, then there’s a higher chance that it might not work with the latest versions of CUDA. Therefore, checking the appropriate versions which works with your own GPU is important before installing the latest CUDA Toolkit version blindly. This can be done searching on the internet or visiting the official NVIDIA developer forum.

Step 02: Remove NVIDIA Nouveau Driver (xserver-xorg-video)

(i) Blacklist the driver

sudo bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf"sudo bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"

(ii) Confirm the content of the new modprobe config file

cat /etc/modprobe.d/blacklist-nvidia-nouveau.conf

After executing the above line, you must see an output similar to the one given below.

blacklist nouveau
options nouveau modeset=0

(iii) Update kernel initramfs

sudo update-initramfs -u

— — — — — — — —Possible Error at this point: — — — — — — — —

update-initramfs: Generating /boot/initrd.img-4.18.0-15-generic
I: The initramfs will attempt to resume from /dev/sda5
I: (UUID=09e25397-4a2c-4fb0-a605-a7013eecb59c)
I: Set the RESUME variable to override this.

The above conflict occurs mostly when you have a dual boot system. Setting the RESUME variable is recommended and the swap UUID needs to be added to the /etc/initramfs-tools/conf.d/resume file.

Identify your swap UUID

blkid | awk -F\" '/swap/ {print $2}'

2. Set the relevant ID in the RESUME file

printf "RESUME=UUID=$(blkid | awk -F\" '/swap/ {print $2}')\n" | sudo tee /etc/initramfs-tools/conf.d/resume

3. Run the following again to update the kernels on the system

sudo update-initramfs -u -k all

— — — — — — — —End of the Error Resolving — — — — — — — —

(iv) Reboot

sudo reboot

Step 03 : Remove Other NVIDIA Drivers

(i) Add the Graphics Driver PPA (Personal Package Archive)

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update

(ii) Purge the Driver

sudo apt-get purge nvidia*

(iii) Reboot

sudo reboot

Step 04 : Install Proprietary NVIDIA Driver

(i) Auto install Drivers

sudo apt update
sudo ubuntu-drivers autoinstall

— — — — — — — —Possible Error at this point: — — — — — — — —

Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 nvidia-driver-396 : Depends: nvidia-dkms-396 (= 396.54-0ubuntu0~gpu18.04.1) but it is not going to be installed
                     Depends: nvidia-utils-396 (= 396.54-0ubuntu0~gpu18.04.1) but it is not going to be installed
                     Recommends: nvidia-settings but it is not going to be installed
                     Recommends: nvidia-prime (>= 0.8) but it is not going to be installed
                     Recommends: libnvidia-compute-396:i386 (= 396.54-0ubuntu0~gpu18.04.1)
                     Recommends: libnvidia-decode-396:i386 (= 396.54-0ubuntu0~gpu18.04.1)
                     Recommends: libnvidia-encode-396:i386 (= 396.54-0ubuntu0~gpu18.04.1)
                     Recommends: libnvidia-ifr1-396:i386 (= 396.54-0ubuntu0~gpu18.04.1)
                     Recommends: libnvidia-fbc1-396:i386 (= 396.54-0ubuntu0~gpu18.04.1)
                     Recommends: libnvidia-gl-396:i386 (= 396.54-0ubuntu0~gpu18.04.1)
E: Unable to correct problems, you have held broken packages.

Resolving this type of issue which involves NVIDIA driver installations requires an instantiation approach instead of a generalized one. Therefore googling the error or searching help on Stack-overflow is recommended.

— — — — — — — —End of the Error Resolving — — — — — — — —

(ii) Reboot

sudo reboot

(iii) Enable the driver & Reboot

sudo prime-select nvidia
sudo reboot

(iv) Checking whether everything is working as expected

nvidia-smi

Example Output:

— — — — — — — — Possible Error at this point: — — — — — — — —

nvidia-smiNVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Remember the error could appear due to internal deoendancies of your installation and the method recommneded below might not solve the issue. If it does not, please search the error on the internet or on Stack-overflow for better results.

Try:

sudo apt-get update –fix-missingsudo dpkg –configure -asudo apt install -f

If the problem of a broken package still exist the solution is to edit the dpkg status file manually.

— — — — — — — — -End of the Error Resolving — — — — — — — —

Step 05 : Install Tensorflow GPU

(i) Install Anaconda

Visit https://www.anaconda.com/products/individual and install the version of your preference.

(ii) Create a New Environment for Tensorflow GPU

Please note that I have used python version 3.7 for this environment based on compatibility issues. Replace that with the version that your machine is compatible for a clean installation.

conda create -n tf-gpu python=3.7
source activate tf-gpu

— — — — — — —— Possible Error at this point: — — — — — — — —

conda: command not found

This occurs because the path for anaconda installation has not been set in your .bashrc or .zshrc

Try:

export PATH="/home/username/anaconda3/bin:$PATH"

— — — — — — —— -End of the Error Resolving — — — — — — — —

(iii) Install CUDA, cuDNN

At this point, use the compatible tensorflow, CUDA and cuDNN versions that you looked up in the Step 01 and change the version numbers accordingly. Here I have used tensorflow 1 but please note that there is a newer version tensorflow 2 available.

conda install \
tensorflow-gpu==1.12 \
cudatoolkit==9.0 \
cudnn=7.1.2 \
h5py

Step 06 : Validate the Installation

For Tensorflow 1:

tf.test.gpu_device_name()

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

for the above command you should see something similar to this:

For Tensorflow 2:

print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

import tensorflow as tf

assert tf.test.is_gpu_available()
assert tf.test.is_built_with_cuda()

Enjoy !

Check my PyTorch GPU installation in Ubuntu if you want to test that out as well (Click Here)

Installing TensorFlow GPU & Enabling CUDA in Ubuntu 18.04— Complete Guide

Important:

Step 01 : System Check

(i) Make sure your system is CUDA capable

(ii) Identify which CUDA/ cuDNN packages are compatible with your GPU

Step 02: Remove NVIDIA Nouveau Driver (xserver-xorg-video)

(i) Blacklist the driver

(ii) Confirm the content of the new modprobe config file

(iii) Update kernel initramfs

(iv) Reboot

Step 03 : Remove Other NVIDIA Drivers

(i) Add the Graphics Driver PPA (Personal Package Archive)

(ii) Purge the Driver

(iii) Reboot

Step 04 : Install Proprietary NVIDIA Driver

(i) Auto install Drivers

(ii) Reboot

(iii) Enable the driver & Reboot

(iv) Checking whether everything is working as expected

Step 05 : Install Tensorflow GPU

(i) Install Anaconda

(ii) Create a New Environment for Tensorflow GPU

(iii) Install CUDA, cuDNN

Step 06 : Validate the Installation

For Tensorflow 1:

For Tensorflow 2:

Written by Isuru Pamuditha