Build OpenCV with DNN and CUDA for GPU-Accelerated Face Detection

7 min readFeb 29, 2024

I’ve been experimenting with various face detection models for my current project and was intrigued by the supposed combination of speed and accuracy in OpenCV’s SSD detector from its Deep Neural Network (DNN) module. But to run this model, I needed OpenCV with CUDA support, which unfortunately meant I had to build OpenCV from source.

There are many tutorials on how to build OpenCV from source, so I’m not trying to reinvent the wheel here. But what works in the tutorials may not work on your machine, which I learned through trial and error. I found the existing tutorials insufficient to deal with the myriad errors I encountered, so I’ve written this account of how I built OpenCV on my machine in case others are dealing with the same baffling errors. I used the following configuration to build OpenCV:

Nvidia GeForce GTX 1080 TI: Compute Capability 6.1
Ubuntu 23.10
CUDA 12.0 installed through Ubuntu package manager
CUDNN 8.9.2.26 installed through Ubuntu package manager
Python 3.11 through Anaconda
OpenCV 4.9.0

Installing CUDA

First, make sure CUDA is installed. I installed it through Ubuntu’s package manager, which is very convenient.

sudo apt install nvidia-cuda-toolkit

For OpenCV’s DNN module to use a GPU, we also need to install cuDNN.

sudo apt install nvidia-cudnn

Be aware that many guides assume that CUDA is installed in

/usr/local/cuda/

But the package manager on Ubuntu 23.10 places it in

/usr/lib/nvidia-cuda-toolkit/

This will be important later if CMake can’t find the CUDA location.

Speaking of CMake, it’s required to build OpenCV, so if you don’t already have it installed, run

sudo apt install cmake cmake-data build-essential

Building OpenCV

We then need to download both OpenCV and OpenCV-contrib source files from GitHub here and here. OpenCV-contrib files should be placed within the OpenCV directory.

Now comes the tricky part: building OpenCV. OpenCV is an enormous library with many dependencies. There are a lot of ways the build can go wrong, often without it being noticeable until runtime. There were many times OpenCV was successfully installed with CUDA and able to find the GPU but was unable to run the face detection model because some package needed to be installed or some libraries were missing.

cmake \
-D WITH_CUDA=ON \
-D WITH_CUDNN=ON \
-D OPENCV_DNN_CUDA=0N \
-D CUDA_ARCH_BIN=6.1 \
-D CMAKE_C_COMPILER=gcc-12 \
-D CMAKE_CXX_COMPILER=g++-12 \
-D OPENCV_EXTRA_MODULES_PATH=/home/amos/sources/opencv/opencv_contrib/modules/ \
-D PYTHON3_EXECUTABLE=/home/amos/anaconda3/envs/cv/bin/python \
-D PYTHON_LIBRARIES=/home/amos/anaconda3/envs/cv/lib/python3.11/site-packages ..

Above is the command I ran to successfully build OpenCV with CUDA support for the DNN module with Python bindings (make sure NumPy is installed in your Python environment). (This configuration only addresses the DNN module, so consult a more general guide for information on how to build the other modules.) I’ll work through how I came to this command.

The next issue that was a thorn in my side was the CUDA_ARCH_BIN flag. OpenCV easily detected both CUDA and cuDNN on my system. Nevertheless, it would give this error

I’m not sure why OpenCV can’t find CUDA_ARCH_BIN, but to get the the compute capability of your GPU run

nvidia-smi --query-gpu=compute_cap --format=csv

It will return a number that you can use to fill the CUDA_ARCH_BIN flag.

After adding the flag, make sure to check that OpenCV correctly found it. If it did, you should see this.

After solving that problem, I immediately encountered another.

This one had me stumped. Earlier I said to put the opencv-contrib source files in the opencv directory so that it could easily find them. Not only did OpenCV not find them on its own, it didn’t find them even when I specified the full path of the directory. Although it seemed like an issue with finding the modules, it was actually an issue with the compiler.

The latest version of the gcc package installed on my system was gcc-13. OpenCV, however, is only compatible with versions up to gcc-12. This problem plagued me initially because OpenCV doesn’t throw an explicit error that identifies the compiler as the problem. I only figured it out when trying to install dlib, which does through an error regarding the compiler. Install with

sudo apt install gcc-12 g++-12

Even after installing, CMake will still find gcc-13, so we need to tell it to use version 12

-D CMAKE_C_COMPILER=gcc-12
-D CMAKE_CXX_COMPILER=g++-12

The next thing to worry about are the Python bindings. In addition to linking to the Python executable, OpenCV requires the NumPy library to be installed. Because I installed OpenCV into an anaconda environment, I needed to point it to that environment’s NumPy package. If we don’t specify a path, however, OpenCV will generally find the system Python.

In this case, it found my base anaconda python. Since I want to install it into a particular conda environment, I had to include the path to it.

-D PYTHON3_EXECUTABLE=/home/amos/anaconda3/envs/cv/bin/python \
-D PYTHON_LIBRARIES=/home/amos/anaconda3/envs/cv/lib/python3.11/site-packages

Your paths will look different from mine. To know that it worked, look for this in the build notes

Also, make sure that the DNN module will be built

Look for dnn in the list of modules.

I ran into some tricky issues trying to enable the DNN module. CMake would list the DNN module, but even after successfully building, installing, and importing cv2 into Python, I would get an error when running detection. The error turned out to be very misleading, saying that OpenCV needed to be built with WebNN. After some digging, I realized the error didn’t make much sense since WebNN is, as it sounds, for web development, which is not what I was doing. The issue turned out to be with the flag I used to enable the DNN module. I initially used

-D WITH_DNN=0N

It turned out I needed to use this one

-D OPENCV_DNN_CUDA=0N

With all the CMake configurations in place, it’s now time to build OpenCV.

cmake \
-D WITH_CUDA=ON \
-D WITH_CUDNN=ON \
-D OPENCV_DNN_CUDA=0N \
-D CUDA_ARCH_BIN=6.1 \
-D CMAKE_C_COMPILER=gcc-12 \
-D CMAKE_CXX_COMPILER=g++-12 \
-D OPENCV_EXTRA_MODULES_PATH=/home/amos/sources/opencv/opencv_contrib/modules/ \
-D PYTHON3_EXECUTABLE=/home/amos/anaconda3/envs/cv/bin/python \
-D PYTHON_LIBRARIES=/home/amos/anaconda3/envs/cv/lib/python3.11/site-packages ..

make -j n #where n is the number of processors to use
sudo make install

After successfully building OpenCV, I still had to get it working with Python. Although I pointed to my anaconda environment in the build configuration, the cv2 Python package is installed to /usr/local/lib/python3.11/site-packages/cv2. To use cv2 in my conda environment, I had to create a symbolic link with the following command

ln -s /usr/local/lib/python3.11/site-packages \
  /home/amos/anaconda3/cv/lib/python3.11/site-packages

Now Python recognized cv2 as installed, but it would still throw an error when I tried to import it.

lib/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /usr/lib/x86_64-linux-gnu/libproxy.so.1)
Failed to load module: /usr/lib/x86_64-linux-gnu/gio/modules/libgiolibproxy.so

To fix this I simply copied the file with

cp /usr/lib/gcc/x86_64-linux-gnu/libstdc++.so.6 /home/amos/anaconda3/lib

This probably isn’t the best solution — even a symlink would have been better — but I was tired by this point and just wanted it done. (My anaconda install now throws a warning when creating a new environment because of that copy, so find another way if you can).

Try not to indiscriminately copy files like I did or you may muck up your environment.

Using OpenCV DNN with CUDA in Python

Just to show the fruits of my labor, here is a simple script I used to test that OpenCV could use the GPU-accelerated caffe model for face detection.

import cv2

# Ensure GPU is available
print(cv2.cuda.getCudaEnabledDeviceCount())

# The prototxt and model need to be downloaded separately
net = cv2.dnn.readNetFromCaffe(<path-to-prototxt>, <path-to-model>)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

image = cv2.imread(<path-to-image>)
blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 1.0,
        (300, 300), (104.0, 177.0, 123.0))
net.setInput(blob)
detections = net.forward()

# The model returns the detections in an odd format a=[:, :, :, :]. 
# Each detection is a[2] while the confidence and bounding box are in 
# a[3]
for i in range(faces.shape[2]):
    confidence = faces[0, 0, i, 2]
    if confidence > threshold:
        (h, w) = image.shape[:2]
        box = faces[0, 0, i, 3:7] * np.array([w, h, w, h])
        (startX, startY, endX, endY) = box.astype("int")
        cv2.rectangle(image, (startX, startY), (endX, endY),
            (0, 0, 255), 2)
cv2.imshow(image)

Ta-da! GPU-accelerated face detection with OpenCV’s DNN module finally working.

Build OpenCV with DNN and CUDA for GPU-Accelerated Face Detection

Installing CUDA

Building OpenCV

Using OpenCV DNN with CUDA in Python

Written by Amos Stailey-Young