Building Tensorflow 2.0 with GPU support and TensorRT on Ubuntu 18.04 LTS [Part 1]

Published in

Analytics Vidhya

10 min readApr 7, 2020

Amongst some prevailing and inevitable choices in life, I once found myself standing on a road which forked into two: Tensorflow or Pytorch? And like almost any other life choice, this too had its own share of uncertainties, assurance and contentment. I took the road my gut called for…and it has made all the difference.

I won’t deny that installing Tensorflow can be a pain and extemely troublesome, especially if we don’t know what we are aiming for, which we almost never do. Yet for certain reasons (flexibility and control being the two most prominent ones), I always prefer building Tensorflow from source. And that is exactly what this article is all about. Since this is a long process, I decided to break it into two parts with the content apportioned as follows:

Part 1: Installation of NVIDIA Driver, CUDA, and cuDNN.

Part 2: Installation of TensorRT and Tensorflow.

For building Tensorflow 1.14 with GPU support and TensorRT on Ubuntu 16.04, kindly refer to this link.

Requirements:

CUDA (= 10.0)
NVIDIA GPU Driver (= 410.x)
cuDNN SDK (= 7.3.1)
CUPTI (ships with CUDA Toolkit)
TensorRT (= 5.0)
Bazel (= 0.26.1)
Python (2.7, 3.5 — 3.7) [Tensorflow 2.1 is the last release which supports Python 2]
pip (version ≥ 19.0)

Step-1: Update and upgrade your system.

$ sudo apt-get update$ sudo apt-get upgrade

Step-2: Verify you have CUDA-enable GPU.

$ lspci | grep -i nvidia

lspci command returns a list of all the hardwares installed on your system. Using grep -i nvidia returns only the name of the NVIDIA driver. Note down the name and check whether your GPU is CUDA-enabled at http://developer.nvidia.com/cuda-gpus.

If present, note down its compute capability. It’ll be required later.

Step-3: Remove any existing or previously installed CUDA along with NVIDIA Graphics Driver.

$ sudo apt-get --purge remove "*cublas*" "cuda*" "nvidia-cuda*"$ sudo rm -rf /usr/local/cuda*$ sudo apt-get --purge remove "*nvidia*"$ sudo apt-get autoremove$ sudo apt-get autoclean

The first two commands will remove all the cuda files and delete its folder respectively. The third command is to uninstall the nvidia driver installed on your system, which will be replaced by a compatible one later. The last two commands are to remove the inept and unnecessary files, which were installed as dependencies for this package but are no longer needed.

Step-4: Install the required NVIDIA driver. **

Before installing CUDA, you need to have a compatible version of NVIDIA driver installed in your machine. There are two ways of installing the driver:

Install it via Graphical User Interface:

Tap the Windows key on your keyboard and search for Software and Updates. When it opens, go under the tab Additional Drivers, graphically depicted below:

Let it fetch the information regarding available NVIDIA drivers compatible with your system. After it’s done, choose the desired driver from the list and click on Apply Changes.

Now reboot.

Install it via Command Line:

$ sudo add-apt-repository ppa:graphics-drivers/ppa$ sudo apt-get update

The first command is to add the canonical package for Ubuntu graphics drivers. The second one is to reload the package manager with the recently added driver package.

You now have to choose between installing the latest drivers or a specific version. The choise depends on your system, Ubuntu kernel version, and the aim of the project (CUDA 10.0 does not support versions ≤ 410.x).

Installing the latest driver is not recommended, because this PPA is still under testing. Sometimes you might encounter dependency problems. I personally faced certain complications like absence of display, failing to boot into Ubuntu after reboot, etc.

To install the latest driver:

$ sudo ubuntu-drivers devices$ sudo ubuntu-drivers autoinstall

The first command enlists the available drivers for both the graphics hardwares, i.e., Intel and NVIDIA (or AMD? Never tried for AMD though).

The second command installs the latest drivers (or the recommended one) for all the graphics devices.

To install a specific version:

$ sudo apt install nvidia-${version_number}

Replace ${version_number} with the desired version.

Reboot.

** This step can be skipped if you choose to install CUDA using runfile installation method. See Choose the Installation Method under Step-5 for more details.

Verify the installation of NVIDIA Graphics Driver:

Method 1:

$ nvidia-smi | grep "Driver Version" | awk '{print $6}' | cut -c1-

This will print the version of the driver installed, if the process was successful.

Method 2:

Go to Settings > Details and then check the entry for Graphics query. If it state some NVIDIA Driver then the driver was successfully installed.

Method 3:

$ prime-select query

prime is a NVIDIA command to switch between Intel and NVIDIA Graphics drivers. The above command displays the name of the driver in use. If the output is nvidia, then the driver was successfully installed.

Step-5: Install the required CUDA version.

Pre-Installation Steps:

Verify that your system has gcc installed:

$ gcc --version

This commands returns the version of gcc installed in your system. Although Tensorflow documentation ask for gcc version to be 7.3.1, yet it gave me a compilation error while building the pip package using bazel. I did what any programmer would do, I googled. I found this issue persists with higher versions of gcc. I’m not sure with which versions it is incompatible with, but I found issues with versions 7 and 8. Rather go with version 6.5 instead, it worked like a charm for me.

[Update]: GCC version 7.5.0 worked too. Previously, I had some flags wrong in the bazel build command. Go for version 6 if 7 does not work for you.

Installing gcc version 6.5 (optional):

$ sudo apt-get install gcc-6 g++-6$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-6 600 --slave /usr/bin/g++ g++ /usr/bin/g++-$ sudo update-alternatives --config gcc

The first command is to install gcc and g++ version 6. After installing, you need to tell the system to use the currently installed version rather than the version in use. This is done by setting the alternative choices for gcc (or by creating symbolic links, whichever you prefer), hence, the second command. You then select the desired version out of the alternative choices using the third command, the output of which looks like this:

Verify the system has the correct kernel headers and development packages:

$ uname -r$ sudo apt-get install linux-headers-${uname -r}

The first command displays the version of the kernel running on your system. The second command installs the corresponding kernel headers and development packages. Substitute ${uname -r} in the second command with the output of the first command.

The installation of CUDA driver requires the version of kernel headers and development packages to be same as that of the kernel running on your system. The runfile does not perform package validation but the .rpm and .deb installation makes an attempt to install the kernel headers and development packages, if they are not found. However it will install the latest version, which might not match with your kernel. It is strongly suggested to install the packages manually.

Choose the installation method:

Method 1: RPM or DEB Installation

.rpm and .deb packages are distribution-specific packages and it interferes with the distribution’s native package management system. This gives it a control to update/upgrade the packages whenever available.

Method 2: Runfile Installation

runfile package is a distribution-independent package and it allows you to work across a wider set of Linux distributions, but does not update the distribution’s native package management system. It also installs the NVIDIA Graphics Driver along with CUDA Toolkit.

I personally prefer runfile package because updates leads to incompatibility in dependencies and also a stable system is an effective system!

Download the NVIDIA CUDA Toolkit 10.0:

Download the desired package (.rpm/.deb or runfile) of CUDA 10.0 from this link. If going for .deb package, choose the deb(local) option, otherwise choose the runfile package, as depicted below:

Installation of CUDA 10.0:

.rpm or .deb installation:

$ cd ~ #Or the directory containing the downloaded .deb package$ sudo dpkg -i cuda-repo-ubuntu1804-10-0-local-10.0.130-410.48_1.0-1_amd64.deb$ sudo dpkg -i cuda-repo-ubuntu1804-10-0-local-nvjpeg-update-1_1.0-1_amd64.deb$ sudo apt-key add /var/cuda-repo-10-0-local-10.0.130-410.48/7fa2af80.pub$ sudo apt-get update$ sudo apt-get install cuda

The first command is to change the directory to the one having the downloaded package. The second and third commands installs the meta-data of the repository and the patch, respectively. The fourth command is to install the public GPG key of CUDA. The fifth and the last commands are to update the APT repository cache and install CUDA, respectively.

runfile installation:

Since runfile installation also installs NVIDIA Graphics Driver, we first need to disable nouveau driver used by default in Ubuntu (assuming no NVIDIA driver is installed or operating).

$ sudo gedit /etc/modprobe.d/blacklist-nouveau.conf#Add these lines to it:
blacklist nouveau
options nouveau modeset=0$ sudo update-initramfs -u

The first command is to create a configuration file. Write down the bold lines in the opened file and close it after saving. Then update the initramfs (the complete set of directories found on a normal root filesystem) of the kernel using the last command.

Reboot to completely disable the nouveau driver **. When the system restarts, go into the text mode (runlevel 3) using Ctrl + Alt + F3.

Before continuing with the installation make sure that you don’t have any application and/or process utilizing NVIDIA driver (if installed ever before).***

Now, run the following command to start the installation process and follow the instructions. After the installation is complete, take similar steps to install the patch.

$ sudo sh cuda_10.0.130_410.48_linux.run#CUDA Version: 10.0.130
#NVIDIA Driver Version: 410.48

** If after restart the text mode does not load or has trouble displaying (usually happens with gaming laptops), try doing the following:

Restart the system and when the grub comes up, press ‘e’ to edit it before booting. When in edit mode, enter the following kernel boot parameters right after the word “splash”:

 nouveau.modeset=0 tpm_tis.interrupts=0 acpi_osi=Linux i915.preliminary_hw_support=1 idle=nomwait

And press F10 to boot. It should work now.

*** If the installation present an error saying it could not unload nvidia-drm, type the following commands right after entering the text mode:

$ systemctl ignore multi-user.target$ modprobe -r nvidia-drm

The first command disables the graphical target, which is what keeps the display manager running. The second command then unloads the nvidia-drm.

After the installation is complete, make sure you load it again using:

$ systemctl start graphical.target

Post-Installation Steps:

Add the paths to recently installed package to PATH variable so that it knows where to look for it when needed.

If installed using .deb package:

$ export PATH=/usr/local/cuda-10.0/bin:/usr/local/cuda-10.0/NsightCompute-2019.1${PATH:+:${PATH}}

If installed using runfile package:

$ export PATH=/usr/local/cuda-10.0/bin:/usr/local/cuda-10.0/NsightCompute-2019.1${PATH:+:${PATH}}For 64-bit operating system:
$ export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64\
                         ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}For 32-bit operating system:
$ export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib\
                         ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Verify the installation of CUDA Toolkit 10.0:

To verify if CUDA 10.0 has been succssfully installed, try to run some sample program. This can be done by installing CUDA Samples, which comes along the package.

If you installed CUDA using runfile package then you don’t have to install the CUDA Samples, but if you used .deb package, then install the samples using the following command:

$ cuda-install-samples-10.0.sh <directory>#Replace <directory> with the location where you want to install the samples.

After installing, run a sample program using the following set of commands:

$ cd NVIDIA_CUDA-10.0_Samples/$ make$ cd 1_Utilities/deviceQuery$ ./deviceQuery

If the output look like the following, then it means that CUDA has been successfully installed.

Side Note: The above process does not compile the graphical samples, only those which uses command-line interface. If you want to build all the samples, install the dependencies [OpenGL (e.g., Mesa), GLU, GLUT, and X11 (including Xi, Xmu, and GLX)] using:

$ sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev  libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev

Verify the installation by running a graphical sample of CUDA 10.0:

$ cd NVIDIA_CUDA-10.0_Samples/$ make$ cd 5_Simulations/nbody$ ./nbody

P.S.: Installing Mesa may overwrite the /usr/lib/libGL.so that was previously installed by the NVIDIA driver, so a reinstallation of the NVIDIA driver might be required after installing these libraries.

Step-6: Install the required cuDNN Library.

Download the cuDNN Runtime Library, Developer Library, and Code Samples and User Guide for CUDA 10.0 and Ubuntu 18.04.

Now, run the following commands to install them:

$ cd ~ #Or the directory containing the downloaded .deb package$ sudo dpkg -i libcudnn7_7.3.1.20-1+cuda10.0_amd64.deb$ sudo dpkg -i libcudnn7-dev_7.3.1.20-1+cuda10.0_amd64.deb$ sudo dpkg -i libcudnn7-doc_7.3.1.20-1+cuda10.0_amd64.deb

Verify the installation of cuDNN Libraries:

$ cp -r /usr/src/cudnn_samples_v7/ ~$ cd cudnn_samples_v7/mnistCUDNN/$ make clean && make$ ./mnistCUDNN

If cuDNN is properly installed and running on your system, you will see a message similar to the following:

Test passed!

This marks the end of part 1. Kindly follow the next part to continue the process. The link to which can be found here.

Thanks!