Deep Learning GPU Setup from Scratch

Prasanna Brabourame
Jun 13 · 6 min read

Deep Learning GPU Powered Machine Setup with Ubuntu

In this article, I try to provide simple instructions. But the challenge is every computer has different hardware and software configurations and people want things differently. There are no universal instructions. Google your problems extensively when needed. It is un-avoidable.

I will provide a brief introduction about my system at work. System specifications are as follows:

RAM — 16 GB
Processor — Intel® Core™ i5–7400 CPU @ 3.00GHz × 4
Graphics — GeForce GTX 1060 6GB/PCIe/SSE2
OS type — 64-bit
DISK space — 256 GB

My daily routine work involves lot much of training and testing of various AI models, hence I need to put my GPU for maximum utilisation. The system configuration process involves the following steps :

1. Ubuntu 20.04 installation
2. Install nvidia driver and verify
3. Install cuda toolkit and verify
4. Install cuDNN and verify
5. Install python and dependent libraries through anaconda platform
6. Verify GPU utilisation

1. Ubuntu installation

  1. Let’s get an overview of the installation process here first (about 10 minutes).

During the installation, it is likely that the original OS and the data may be destroyed. Even setting changes may be reversible, the original OS can be non-bootable without reinstallation. Therefore,

Always create a recovery drive and an image backup for your original computer.

Get a 32GB USB drive for the recovery disc and an 8GB USB drive for the Ubuntu bootable. The image backup can store in an external drive.

2. Install Nvidia driver and verify

After the installation is completed, we can fix the display driver. First, find out the version of the driver that you need. Boot the installed Ubuntu system with the kernel parameter nomodeset. Replace 410 below with the driver version you need.

$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt update
$ sudo apt-get install build-essential
$ sudo apt-get install nvidia-driver-410
$ sudo apt-get install mesa-common-dev
$ sudo apt-get install freeglut3-dev

Prevent automatic updates that might break the drivers.Do this by removing the graphics-drivers PPA from your software sources

$ sudo add-apt-repository — r ppa:graphics-drivers/ppa

A system restart is needed whenever the driver is changed. For future driver update after Ubuntu is installed, you can also use its GUI interface demonstrated here.

Now, you don’t need to set the nomodeset parameter anymore. Just practice “Plug-and-Pray” again :-). I have installed Ubuntu in 15 minutes but sometimes the whole process may take much longer with unexpected issues.

verify nvidia installation using the following steps:

$ lsmod | grep nvidia

response will be similar to

$ lsmod | grep nvidia

Make sure the Nouveau drivers are disabled. Enter the command as follows and nothing should be displayed.

$ lsmod | grep nouveau

Verify the driver installation using Nvidia-smi tool

$ watch -n 1 nvidia-smi

Above watch command refreshes every second. Nvidia-smi should display something similar:

$ watch -n 1 nvidia-smi

3. Install cuda toolkit and verify

Nvidia CUDA toolkit provides a development environment to create GPU-accelerated applications. Deep Learning (DL) platform uses it to speed up operations and need to be installed for GPU.

Come here and select the options like the one below.

CUDA TOOLKIT

This will generate the instructions above for you to run. Below is the one in the text form.

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pinsudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600wget https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda-repo-ubuntu1804-11-3-local_11.3.1-465.19.01-1_amd64.debsudo dpkg -i cuda-repo-ubuntu1804-11-3-local_11.3.1-465.19.01-1_amd64.debsudo apt-key add /var/cuda-repo-ubuntu1804-11-3-local/7fa2af80.pubsudo apt-get updatesudo apt-get -y install cuda

Driver version determines the CUDA toolkit version that matches your configuration: https://docs.nvidia.com/deploy/cuda-compatibility/

Realizing that Ubuntu 18.10 is the right choice, the documentation helps determining what kernel and C compiler versions are suitable: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

But if you have dependency issues in the last command in installing CUDA, you can use aptitude instead. It handles the dependency issue better here.

sudo apt-get install aptitude
sudo aptitude install cuda

In our example, CUDA 11.3 will be installed and you can set the following into your environment.

export PATH=$PATH:/usr/local/cuda/bin
export CUDADIR=/usr/local/cuda
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64

For verifying Cuda installation, go to the Cuda samples location as selected while Cuda installation and follow:

$ cd NVIDIA_CUDA-11.3_Samples_Backup
$make
cd 1_Utilities/deviceQuery
$ ./deviceQuery

Cuda installation, if successful, the above query should display something similar as below:

4. Install cuDNN and verify

cuDNN is an Nvidia GPU-accelerated library for DL. Download from here for the three packages needed for Ubuntu. Then install them with the corresponding name of the downloaded packages.

cuDNN is the library that ultimately integrates with your ML framework of choice. This support matrix helps determining the correct version: https://docs.nvidia.com/deeplearning/sdk/cudnn-support-matrix/index.html

sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.2_amd64.deb
sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.2_amd64.deb
sudo dpkg -i libcudnn7-doc_7.6.5.32-1+cuda10.2_amd64.deb

Your new environment settings should be set to:

export PATH=$PATH:/usr/local/cuda/bin
export CUDADIR=/usr/local/cuda
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64

Follow the instructions below to test the installations. If you can build and pass the MNIST application below, your system is ready for CUDA and cuDNN. Copy the code samples to home ($HOME) location.

$ cp -r /usr/src/cudnn_samples_v7/ $HOME
$ cd $HOME/cudnn_samples_v7/mnistCUDNN
$ make clean && make
$ ./mnistCUDNN

If cuDNN is properly installed and running on your Linux system, you will see a message similar to the following:

Test Passed!

5. Install python and dependent libraries through anaconda platform

Download and install Anaconda latest version for Linux from here.

Go to the downloaded directory in terminal and enter

$ sudo sh Anaconda3-2021.05-Linux-x86_64.sh

Restart the terminal. For a better usage of python libraries, use virtual environments. I would be demonstrating based on such virtual environments. Choose a name for the virtual environment. Since I work in deep learning projects extensively, I chose to name the virtual environment — deeplearning.

$ conda create -name deeplearning
$ source activate deeplearning

Once the virtual environment prompt is active, install opencv, tensorflow, keras libraries.

(deeplearning)$ conda install -c conda-forge opencv
(deeplearning)$ conda install -c anaconda tensorflow-gpu
(deeplearning)$ conda install -c anaconda keras

When prompted for confirmation for installating dependencies, enter (Y)es.

6. Verify GPU utilisation

Open python from the virtual environment by entering the following:

(deeplearning)$ python

Enter the following commands into the python console:

from tensorflow.python.client import device_libdef get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type==‘GPU’]
listGPU = get_available_gpus()

It should list the GPU available. The result would like below:

I tensorflow/core/common_runtime/gpu/gpu_device.cc:1
115] created TensorFlow device ([device:GPU:0 with 5111 MB memory) —> physical
GPU (device: 0, nane: GeForce GTX 1060 6GB, pci bus id: 0000:05:00.0, compute capability : 0.1)

Moving Forward

I am very reluctant to write this article. The variants of software and hardware configurations make it extremely hard for universal instructions. Different steps lead a machine to different states. Without tracing what you did, it will be unlikely to solve your problems. Google your error message extensively. If you cannot find any information that gives you hints within an hour, it is likely that your system state or configuration is very different. You may want to restart with a better and known state first. Being said, because of the complexity, I prefer not to address individual troubleshooting for this article. I know your pain but I found it too hard to support it this way.

When you are not familiar with the setup process, keep things simple and not ask for perfection. Refine the process iteratively will get you to the target much faster. If you find a solution that can help people, please leave a note in the response.

Another challenge is to have this information updated. For example, TensorFlow API changes very fast and often not backward compatible. If you find outdated information, please list the old description and state what should be the new one. That will help me a lot to know the changes.

If you know a better way to do things, let me know also. Keep things simple. I want to apply the 80/20 rule: cover the important but not every angle.

Geek Culture

Proud to geek out. Follow to join our 1M monthly readers.