Install CUDA 9.2 and cuDNN 7.1 for PyTorch (GPU) on Ubuntu 16.04

Zhanwen Chen
Repro Repo
Published in
5 min readJul 30, 2018

NVIDIA recently released CUDA 9.2 and cuDNN 7.1, which have been supported by PyTorch but not TensorFlow. To take advantage of them, here’s my working installation instructions, based on my previous post.

1. Install NVIDIA Driver Version 396 via apt-get

CUDA 9.2 requires driver version NVIDIA 396, which is not yet released to the main Ubuntu repository. The best way to install it is through the graphics-drivers ppa:

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-396 nvidia-modprobe

You may be prompted to disable Secure Boot. If so, select Disable. Then reboot your machine but enter BIOS to disable Secure Boot. Typically you can enter BIOS by hitting F12 rapidly as soon as the system restarts.

To verify installation, restart your machine with reboot and type nvidia-smi. If successful, you should get something that looks like:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.45 Driver Version: 396.45 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1060 Off | 00000000:01:00.0 Off | N/A |
| N/A 61C P0 27W / N/A | 178MiB / 6078MiB | 2% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 995 G /usr/lib/xorg/Xorg 135MiB |
| 0 1828 G compiz 39MiB |
+-----------------------------------------------------------------------------+

2. Install CUDA 9.2 via Runfile

The CUDA runfile installer can be downloaded from NVIDIA’s website. Choose the runfile option.

What you download is a package the following three components:

  1. an NVIDIA driver installer, but usually of stale version;
  2. the actual CUDA installer;
  3. the CUDA samples installer;

I suggest extracting the above three components and executing 2 and 3 separately (remember we installed the driver ourselves already). To extract them, execute the runfile installer with --extract option:

chmod +x cuda_9.2.148_396.37_linux-run
./cuda_9.2.148_396.37_linux-run --extract=$HOME

You should have unpacked three components: NVIDIA-Linux-x86_64-396.37.run (1. NVIDIA driver that we ignore), cuda-linux.9.2.148-24330188.run (2. CUDA 9.2 installer), and cuda-samples.9.2.148-24330188-linux.run (3. CUDA 9.2 Samples).

Execute the second one to install the CUDA Toolkit 9.2:

$ sudo ./cuda-linux.9.2.148-24330188.run

You now have to accept the license by scrolling down to the bottom (hold the “d” key on your keyboard) and enter “accept”. Next accept the defaults.

To verify our CUDA installation, install the sample tests by

sudo ./cuda-samples.9.2.148-24330188-linux.run

After the installation finishes, you must configure the runtime library.

sudo bash -c "echo /usr/local/cuda/lib64/ > /etc/ld.so.conf.d/cuda.conf"sudo ldconfig

It is also recommended for Ubuntu users to append string /usr/local/cuda/bin to system file /etc/environment so that nvcc will be included in $PATH. This will take effect after reboot. To do that, you just have to

$ sudo vim /etc/environment

and then add :/usr/local/cuda/bin (including the ":") at the end of the PATH="/blah:/blah/blah" string (inside the quotes).

After a reboot, let's test our installation by making and invoking our tests:

$ cd /usr/local/cuda-9.2/samples
$ sudo make

It’s a long process with many irrelevant warnings about deprecated architectures (sm_20 and such ancient GPUs). After it completes, run deviceQuery and p2pBandwidthLatencyTest:

$ cd /usr/local/cuda-9.2/samples/bin/x86_64/linux/release
$ ./deviceQuery

The result of ./deviceQuery should look something like

./deviceQuery Starting...CUDA Device Query (Runtime API) version (CUDART static linking)Detected 1 CUDA Capable device(s)Device 0: "GeForce GTX 1060"
CUDA Driver Version / Runtime Version 9.2 / 9.2
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 6078 MBytes (6373572608 bytes)
(10) Multiprocessors, (128) CUDA Cores/MP: 1280 CUDA Cores
GPU Max Clock rate: 1671 MHz (1.67 GHz)
Memory Clock rate: 4004 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.2, NumDevs = 1
Result = PASS

Cleanup: if ./deviceQuery works, remember to rm the 4 files (1 downloaded and 3 extracted).

3. Install cuDNN 7.1

The recommended way to install cuDNN 7.1 is to download all 3 .deb files. The .tgz installation approach doesn’t allow verification by running code samples (no way to install the code samples .deb after .tgz installation).

The following steps are pretty much the same as the installation guide using .deb files (strange that the cuDNN guide is better than the CUDA one).

  1. Go to the cuDNN download page (need registration) and select the latest cuDNN 7.1 version made for CUDA 9.2.
  2. Download all 3 .deb files: the runtime library, the developer library, and the code samples library for Ubuntu 16.04.
  3. In your download folder, install them in the same order:

sudo dpkg -i libcudnn7_7.1.4.18–1+cuda9.2_amd64.deb (the runtime library),

sudo dpkg -i libcudnn7-dev_7.1.4.18–1+cuda9.2_amd64.deb (the developer library), and

sudo dpkg -i libcudnn7-doc_7.1.4.18–1+cuda9.2_amd64.deb (the code samples).

Now we can verify the cuDNN installation (below is just the official guide, which surprisingly works out of the box):

  1. Copy the code samples somewhere you have write access: cp -r /usr/src/cudnn_samples_v7/ ~.
  2. Go to the MNIST example code: cd ~/cudnn_samples_v7/mnistCUDNN.
  3. Compile the MNIST example: make clean && make.
  4. Run the MNIST example: ./mnistCUDNN. If your installation is successful, you should see Test passed! at the end of the output.

Do NOT Install cuda-command-line-tools

Contrary to the official TensorFlow installation docs, you don’t need to install cuda-command-line-tools because it’s already installed in this version of CUDA and cuDNN. If you apt-get it, you won’t find it.

Configure the CUDA and cuDNN library paths

What you do need to do, however, is exporting environment variables LD_LIBRARY_PATH in your .bashrc file:

# put the following line in the end or your .bashrc file
export LD_LIBRARY_PATH="LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}/usr/local/cuda/extras/CUPTI/lib64"

And source it by source ~/.bashrc.

--

--

Zhanwen Chen
Repro Repo

A PhD student interested in learning from data.