Minimal cuDNN C++ Example

3 min readAug 25, 2021

cuDNN is a library developed by Nvidia that provides optimised GPU implementations of neural network primitives (convolutions, activations, etc). cuDNN is used in the background by most popular high-level neural network libraries, including PyTorch and Tensorflow. While Nvidia provides some sample code (including examples on RNN, convolutions and MNIST) along with the library installation, these are relatively large (1000+ lines of code) and span multiple source code files. This article contains a small (< 70 lines of code) cuDNN example that:

creates a tensor
applies the sigmoid function on it
and prints the output

The full code for this is:

cuDNN code to calculate sigmoid of a small array.

What the code is doing:

Lines 1–3 import the libraries we’ll need — iostream.h for general IO, cuda.h for interacting with the GPU, and cudnn.h.
Lines 11–13 print how many GPUs are found connected to your computer. In this example, I tell it to use GPU0 (in Line 14).
Lines 15–19 use the cuda library function cudaGetDevice and cudaGetDeviceProperties and prints the compute capability of the selected GPU.
The first thing to do while using cuDNN is creating a handle. You will need to use this handle in all subsequent cuDNN functions. (Lines 21–23)
In cuDNN, whenever we create a tensor, we need to provide cuDNN with information on the tensor. This is done by creating a descriptor for our tensor (lines 26–32). The cuDNN explains what the NCHW data format is here.
Lines 35–39 create the tensor and print it to screen. The array created is a simple one: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]. While using cuDNN, all data/tensors must be allocated only on the GPU.
Similar to how we created a descriptor for our tensor, that explains what our tensor looks like, we need to create a descriptor for the activation function too (Lines 42–48).
Line 50 has the function call that does the sigmoid activation (finally). You can see what each argument is supposed to be from the cuDNN API reference (direct link to activation function).
Line 61 destroys the handle that we created in lines 21–23.
Lines 62–65 print the processed array.
Line 66 frees the memory that we allocated on the GPU.

To compile the code, copy it and save it in a file hw.cpp and use these bash commands:

g++ -I/usr/local/cuda/include -I/usr/local/cuda/targets/ppc64le-linux/include -o hw.o -c hw.cpp/usr/local/cuda/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_70,code=sm_70 -o hw hw.o -I/usr/local/cuda/include -I/usr/local/cuda/targets/ppc64le-linux/include -L/usr/local/cuda/lib64 -L/usr/local/cuda/targets/ppc64le-linux/lib -lcublasLt -lcudart -lcublas -lcudnn -lstdc++ -lm

Some small things:

I ran this code on a machine running Ubuntu 16.04 LTS with Nvidia driver version 465.19.01 with CUDA Version 11.3 and cuDNN 8.2.1.32-1 on a Tesla v100 GPU. The v100 has a compute capability of 7.0 (you can check the compute capability of your GPU model on the Nvidia website). Based on the compute capability of your GPU you might have to change two items in the first bash command (in bold). For example, if your compute capability is 5.0, you’ll have to change the command to contain `arch=compute_50, code=sm_50` instead.
The first bash command compiles the C++ code, but doesn’t link the libraries (the -c flag prevents the linking from happening).
The second bash command uses nvcc to compile, link and create the final executable.
Run the executable by running ./hw

The output that I got was:

Found 4 GPUs.
Compute capability:7.0
Created cuDNN handle
Original array: 0 1 2 3 4 5 6 7 8 9 
Destroyed cuDNN handle.
New array: 0.5 0.731059 0.880797 0.952574 0.982014 0.993307 0.997527 0.999089 0.999665 0.999877

To cross check if the answer we got was correct, let’s quickly cross check in Python:

>>> import numpy as np
>>> arr = np.asarray([0,1,2,3,4,5,6,7,8,9])
>>> np.reciprocal(np.exp(-1 * arr) + 1)
array([0.5       , 0.73105858, 0.88079708, 0.95257413, 0.98201379,
       0.99330715, 0.99752738, 0.99908895, 0.99966465, 0.99987661])
>>>

So, that’s it — a simple cuDNN program!

Minimal cuDNN C++ Example

Written by Rohit Dwivedula