A “deterministic” procedure to configure an NVIDIA GPU for data science on Ubuntu 18.10

Ivan Vasquez
Jul 7 · 8 min read

This article provides a structured approach to determine the right combination of software for a trouble-free configuration of GPU hardware on Ubuntu x86–64.

Refer to the appendix at the bottom for the list of hardware components used to build this setup.


Phase 1: Planning

Begin by determining the right versions of operating system, kernel, compiler, NVIDIA driver, CUDA toolkit and cuDNN library that are known to work with your GPU.

GPU

Determine your GPU’s “compute capability”. This install guide targets a consumer grade GeForce RTX 2080 with a compute capability of 7.5:

Driver

Next, find the right driver for Linux 64-bit from:

Driver search
Driver download

From the above selections, driver version 430.26 was current at the time of writing. Driver version determines the CUDA toolkit version that matches your configuration:

CUDA Toolkit

Knowing that CUDA 10.1 is the target, a desirable goal is to install the most recent version of Ubuntu known to work with it. The CUDA toolkit download page () helps in determining that:

Realizing that Ubuntu 18.10 is the right choice, the documentation helps determining what kernel and C compiler versions are suitable:

cuDNN

cuDNN is the library that ultimately integrates with your ML framework of choice. This support matrix helps determining the correct version:

From the above, cuDNN version 7.6.1 matches all other components.

Install plan summary:

  • Ubuntu 18.10
  • Kernel 4.18.0, GCC 8.2.0
  • NVIDIA Driver 430.26
  • CUDA 10.1
  • cuDNN 7.6.1

Phase 2: Installation

Operating System

Download and perform a normal install of Ubuntu 18.10 for a 64-bit PC. It is very important to ensure that kernel and GCC compiler versions match, otherwise the NVIDIA driver install will fail.

Verify kernel version straight out of the install:

uname -aLinux cali 4.18.0-25-generic #26-Ubuntu SMP Mon Jun 24 09:32:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Install gcc and make:

sudo apt install gcc
sudo apt install build-essential

Note: A default install of gcc on Ubuntu 18.10 yields version 8.3.0, which isn’t an exact match to the recommended 8.2.0 (screenshot above), however it did not cause any problems.

gcc --versiongcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.

During configuration, it is possible to apply Ubuntu package updates without negative side effects. However, this risks breaking the above dependencies. Above all, do not upgrade to a newer version of the kernel, gcc or Ubuntu, as it will be necessary to roll back or reinstall again.

Install NVIDIA Driver

Run the installer executable and pass --no-opengl-files since we plan on using it solely for processing, not for display purposes:

sudo ./NVIDIA-Linux-x86_64-430.26.run --no-opengl-filesVerifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 430.26........

It’s possible to ignore/accept the following warnings:

Answer “Yes” to the question about running nvidia-xconfig, which will create the X configuration file/etc/X11/xorg.conf

Verify the driver is correctly installed by checking GPU status and looking at the newly loaded NVIDIA kernel modules:

nvidia-smi
Output of nvidia-smi
lsmod | grep NVIDIAnvidia_drm             45056  0
nvidia_modeset 1110016 1 nvidia_drm
nvidia 18792448 1 nvidia_modeset
ipmi_msghandler 102400 2 ipmi_devintf,nvidia
drm_kms_helper 172032 2 nvidia_drm,i915
drm 454656 9 drm_kms_helper,nvidia_drm,i915

Restart the computer for changes to take effect.

Notes:

  • Contrary to what previous driver versions and posts suggest, it is not necessary to disable the noveau driver or stop X during this part of the configuration.
  • In order to uninstall the driver, simply run nvidia-uninstall and follow the prompts.

X Configuration

If left as configured by default, X (Linux’ graphical desktop system) will make use of your NVIDIA GPU.

In the above screenshot, there are X processes already attached to the GPU. The goal is to dedicate the NVIDIA GPU for processing, for reasons described here:

In the above screenshot, there are X processes attached to the GPU. The goal is to dedicate the NVIDIA GPU for processing, for reasons described here:

Make sure that your computer screen is physically connected to the motherboard’s video output. Then configure X to use that video controller for display. To do so, take note of the video controllers available to the system:

lspci|grep VGA00:02.0 VGA compatible controller: Intel Corporation Device 3e98
01:00.0 VGA compatible controller: NVIDIA Corporation GV104 [GeForce GTX 1180] (rev a1)

From the above output, the motherboard’s Intel video controller has a PCI address of 00:02.0. Use it to modify /etc/X11/xorg.conf as follows:

Change the default “Device” entry from NVIDIA’s:

Section “Device”
Identifier “Device0”
Driver “nvidia”
VendorName “NVIDIA Corporation”
EndSection
Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
SubSection "Display"
Depth 24
EndSubSection
EndSection

…to Intel’s, including the PCI address in the format PCI:0:2:0 and setting the driver to nouveau, the default for X.

Section "Device"
Identifier "Device0"
BusID "PCI:0:2:0"
Driver "
nouveau"
VendorName "Intel"
EndSection


Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
SubSection "Display"
Depth 24
EndSubSection
EndSection

Google Chrome flickering

Note: Some articles suggest using the intel driver for the integrated controller, however the default for this hardware on Ubuntu 18.10 was nouveau. Using intel led to display issues in some applications.


After a restart, it should be possible to use your display connected to the integrated video controller, and nvidia-smi should list no processes:

This is what we want

Install CUDA toolkit

Run the installer executable:

sudo ./cuda_10.1.168_418.67_linux.run

Deselect the option to install the included driver (418.67), as it will be older than the one installed in the previous step (430.26).

Look for and address any errors in /var/log/cuda-installer.log.

CUDA executables and libraries be installed in /usr/local/cuda.

Install cuDNN library

In order to download cuDNN, NVIDIA requires users to register as developers. Download the version that matches your CUDA toolkit version, in this case, version 10.1:

Out of the available download formats, the “Linux” version comes as a tar-gzip file that’s easy to install.

Once downloaded, extract and copy header and library files according to the installation guide:

tar -xvzf cudnn-10.1-linux-x64-v7.6.1.34.tgzsudo cp -P cuda/include/cudnn.h /usr/local/cuda/include
sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
sudo ldconfig

ldconfig creates a configuration file, /etc/ld.so.conf/cuda-10–1.conf, that tells applications where to find CUDA dependencies. Think of it as a better way alternative to settingLD_LIBRARY_PATH=/usr/local/cuda/lib64 in .profile.

Install MXNET

Follow install instructions to determine the right pip package to install for a GPU-aware framework:

From the table above, install the version matching CUDA 10.1:

pip install mxnet-cu101

Validate configuration

To test the setup, I used MXNET’s sample MNIST program:

import time
start_time = time.time()
import mxnet as mx
mnist = mx.test_utils.get_mnist()

# Fix the seed
mx.random.seed(42)

print(f'There are {mx.context.num_gpus()} GPUs')

# Set the compute context, GPU is available otherwise CPU
ctx = mx.gpu() if mx.context.num_gpus() else mx.cpu()

batch_size = 100
train_iter = mx.io.NDArrayIter(mnist['train_data'], mnist['train_label'], batch_size, shuffle=True)
val_iter = mx.io.NDArrayIter(mnist['test_data'], mnist['test_label'], batch_size)

data = mx.sym.var('data')
# Flatten the data from 4-D shape into 2-D (batch_size, num_channel*width*height)
data = mx.sym.flatten(data=data)

# The first fully-connected layer and the corresponding activation function
fc1 = mx.sym.FullyConnected(data=data, num_hidden=128)
act1 = mx.sym.Activation(data=fc1, act_type="relu")

# The second fully-connected layer and the corresponding activation function
fc2 = mx.sym.FullyConnected(data=act1, num_hidden = 64)
act2 = mx.sym.Activation(data=fc2, act_type="relu")

# MNIST has 10 classes
fc3 = mx.sym.FullyConnected(data=act2, num_hidden=10)
# Softmax with cross entropy loss
mlp = mx.sym.SoftmaxOutput(data=fc3, name='softmax')

import logging
logging.getLogger().setLevel(logging.DEBUG)
mlp_model = mx.mod.Module(symbol=mlp, context=ctx)
mlp_model.fit(train_iter, # train data
eval_data=val_iter,
optimizer='sgd',
optimizer_params={'learning_rate':0.1},
eval_metric='acc',
batch_end_callback = mx.callback.Speedometer(batch_size, 100),
num_epoch=10) # train for at most 10 dataset passes

print(f'Execution time: {(time.time() - start_time)} seconds')

While the application runs, nvidia-smi should list it attached to the GPU:

This completes the validation process. Congratulations, your GPU-based workstation is ready!


Appendix: Hardware used in this setup

The following hardware worked flawlessly and cost USD$2K (plus tax) at the time of writing:

  • Video card: ASUS GeForce RTX 2080 O8G ROG STRIX OC Edition
  • Motherboard: MSI MPG Z390M Gaming Edge AC LGA1151
  • Processor: Intel Core i9–9900K
  • Processor cooler: Cooler Master Hyper 212 Evo CPU Cooler
  • Memory: 2x Corsair LPX 32GB DRAM 3000MHz C15 Memory Kit
  • Storage: Sabrent 1TB Rocket NVMe PCIe M.2 2280 Internal SSD
  • Case: InWin 301 Black Tempered Glass Premium Micro-ATX Mini-ITX Tower
  • Processor cooler: Cooler Master Hyper 212 Evo CPU Cooler
  • Power supply: Seasonic FOCUS 750 Gold SSR-750FM 750W

Ivan Vasquez

Written by

Tech leader, entrepreneur, technologist, bicyclist