Install CUDA 9.2 and cuDNN 7.1 for PyTorch (GPU) on Ubuntu 16.04

Published in

Repro Repo

5 min readJul 30, 2018

NVIDIA recently released CUDA 9.2 and cuDNN 7.1, which have been supported by PyTorch but not TensorFlow. To take advantage of them, here’s my working installation instructions, based on my previous post.

1. Install NVIDIA Driver Version 396 via apt-get

CUDA 9.2 requires driver version NVIDIA 396, which is not yet released to the main Ubuntu repository. The best way to install it is through the graphics-drivers ppa:

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-396 nvidia-modprobe

You may be prompted to disable Secure Boot. If so, select Disable. Then reboot your machine but enter BIOS to disable Secure Boot. Typically you can enter BIOS by hitting F12 rapidly as soon as the system restarts.

To verify installation, restart your machine with reboot and type nvidia-smi. If successful, you should get something that looks like:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.45                 Driver Version: 396.45                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1060    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   61C    P0    27W /  N/A |    178MiB /  6078MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0       995      G   /usr/lib/xorg/Xorg                           135MiB |
|    0      1828      G   compiz                                        39MiB |
+-----------------------------------------------------------------------------+

2. Install CUDA 9.2 via Runfile

The CUDA runfile installer can be downloaded from NVIDIA’s website. Choose the runfile option.

What you download is a package the following three components:

an NVIDIA driver installer, but usually of stale version;
the actual CUDA installer;
the CUDA samples installer;

I suggest extracting the above three components and executing 2 and 3 separately (remember we installed the driver ourselves already). To extract them, execute the runfile installer with --extract option:

chmod +x cuda_9.2.148_396.37_linux-run
./cuda_9.2.148_396.37_linux-run --extract=$HOME

You should have unpacked three components: NVIDIA-Linux-x86_64-396.37.run (1. NVIDIA driver that we ignore), cuda-linux.9.2.148-24330188.run (2. CUDA 9.2 installer), and cuda-samples.9.2.148-24330188-linux.run (3. CUDA 9.2 Samples).

Execute the second one to install the CUDA Toolkit 9.2:

$ sudo ./cuda-linux.9.2.148-24330188.run

You now have to accept the license by scrolling down to the bottom (hold the “d” key on your keyboard) and enter “accept”. Next accept the defaults.

To verify our CUDA installation, install the sample tests by

sudo ./cuda-samples.9.2.148-24330188-linux.run

After the installation finishes, you must configure the runtime library.

sudo bash -c "echo /usr/local/cuda/lib64/ > /etc/ld.so.conf.d/cuda.conf"sudo ldconfig

It is also recommended for Ubuntu users to append string /usr/local/cuda/bin to system file /etc/environment so that nvcc will be included in $PATH. This will take effect after reboot. To do that, you just have to

$ sudo vim /etc/environment

and then add :/usr/local/cuda/bin (including the ":") at the end of the PATH="/blah:/blah/blah" string (inside the quotes).

After a reboot, let's test our installation by making and invoking our tests:

$ cd /usr/local/cuda-9.2/samples
$ sudo make

It’s a long process with many irrelevant warnings about deprecated architectures (sm_20 and such ancient GPUs). After it completes, run deviceQuery and p2pBandwidthLatencyTest:

$ cd /usr/local/cuda-9.2/samples/bin/x86_64/linux/release
$ ./deviceQuery

The result of ./deviceQuery should look something like

./deviceQuery Starting...CUDA Device Query (Runtime API) version (CUDART static linking)Detected 1 CUDA Capable device(s)Device 0: "GeForce GTX 1060"
  CUDA Driver Version / Runtime Version          9.2 / 9.2
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 6078 MBytes (6373572608 bytes)
  (10) Multiprocessors, (128) CUDA Cores/MP:     1280 CUDA Cores
  GPU Max Clock rate:                            1671 MHz (1.67 GHz)
  Memory Clock rate:                             4004 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.2, NumDevs = 1
Result = PASS

Cleanup: if ./deviceQuery works, remember to rm the 4 files (1 downloaded and 3 extracted).

3. Install cuDNN 7.1

The recommended way to install cuDNN 7.1 is to download all 3 .deb files. The .tgz installation approach doesn’t allow verification by running code samples (no way to install the code samples .deb after .tgz installation).

The following steps are pretty much the same as the installation guide using .deb files (strange that the cuDNN guide is better than the CUDA one).

Go to the cuDNN download page (need registration) and select the latest cuDNN 7.1 version made for CUDA 9.2.
Download all 3 .deb files: the runtime library, the developer library, and the code samples library for Ubuntu 16.04.
In your download folder, install them in the same order:

sudo dpkg -i libcudnn7_7.1.4.18–1+cuda9.2_amd64.deb (the runtime library),

sudo dpkg -i libcudnn7-dev_7.1.4.18–1+cuda9.2_amd64.deb (the developer library), and

sudo dpkg -i libcudnn7-doc_7.1.4.18–1+cuda9.2_amd64.deb (the code samples).

Now we can verify the cuDNN installation (below is just the official guide, which surprisingly works out of the box):

Copy the code samples somewhere you have write access: cp -r /usr/src/cudnn_samples_v7/ ~.
Go to the MNIST example code: cd ~/cudnn_samples_v7/mnistCUDNN.
Compile the MNIST example: make clean && make.
Run the MNIST example: ./mnistCUDNN. If your installation is successful, you should see Test passed! at the end of the output.

Do NOT Install cuda-command-line-tools

Contrary to the official TensorFlow installation docs, you don’t need to install cuda-command-line-tools because it’s already installed in this version of CUDA and cuDNN. If you apt-get it, you won’t find it.

Configure the CUDA and cuDNN library paths

What you do need to do, however, is exporting environment variables LD_LIBRARY_PATH in your .bashrc file:

# put the following line in the end or your .bashrc file
export LD_LIBRARY_PATH="LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}/usr/local/cuda/extras/CUPTI/lib64"

And source it by source ~/.bashrc.