How to setup Nvidia Driver, CUDA Toolkit, and cuDNN in Ubuntu 20.04

4 min readNov 20, 2022

GPU — NVIDIA RTX A5000

In this article, I will explain our problems and how we figure them out while installing. First, I’ll describe the project’s aim and why we prefer on-premise instead of a cloud-native provider.

Briefly, we trained an object detection model via Tensorflow. TFOD detects the identity card in the image and crops it; another character density map model extracts the information via EasyOCR. We are serving the model via FastAPI as a microservice.

Our project includes sensitive information, so we’ve been limited to choosing the data-center location. The data center has to be in the same country as the company because of our regulations (similar to GDPR).

Let’s we deep into installation parts!

1- Install Nvidia Driver
2- Install CUDA Toolkit
3- Install cuDNN

1- Install Nvidia Driver

sudo apt-get install make gcc -y
cd /tmp
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/515.76/NVIDIA-Linux-x86_64-515.76.run
sudo bash NVIDIA-Linux-x86_64-515.76.run

Check the nvidia-smiand make sure that you’ve successfully installed it.

2- Install CUDA Toolkit

nvidia-smi the command shows you CUDA Version 11.7, which means you should install 11.7.x. I’ll install the 11.7.1 version for this example.

Also, you can check all the versions with the below link.

CUDA Toolkit Archive

Previous releases of the CUDA Toolkit, GPU Computing SDK, documentation and developer drivers can be found using the…

developer.nvidia.com

cd /temp
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda-repo-ubuntu2004-11-7-local_11.7.1-515.65.01-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-11-7-local_11.7.1-515.65.01-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2004-11-7-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

# Environment variables
sudo nano ~/.bashrc
export PATH=/usr/local/cuda-11.7/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.7/include:$LD_LIBRARY_PATH

3- Install cuDNN

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt-get update
sudo apt-get install libcudnn8=8.5.0.*-1+cuda11.7
sudo apt-get install libcudnn8-dev=8.5.0.*-1+cuda11.7

Let’s make the first inference request to our application and watch the nvidia-smi.

gunicorn main:app --workers 1 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:5001 --timeout 600

After making the request, we get an error like the one below.
Start cannot spawn child process: No such file or directory

watch nvidia-smi

This is the tricky point. We have two GPU consumers, and it’s hard to detect what is wrong with them.

1- Tensorflow (2.9.2)
2- EasyOCR (1.6.2)

RuntimeError: CUDA out of memory.

We split our code step by step and finally found what the problem is. Tensorflow allocates the whole memory when the instance on ready. Another GPU consumer (EasyOCR) does not allocate the memory because there is insufficient space.

Let’s check out some memory limit configurations on Tensorflow. There are two methods for limitation.

1- Grow memory (only grow the memory usage as is needed by the process)

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

Another way to enable this option is to set the environmental variable TF_FORCE_GPU_ALLOW_GROWTH to true. This configuration is platform specific.

2- Memory limit

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

We have a single GPU, and we tried both approaches and decided on the first option to grow memory.

Let’s try our service again with four workers and make an inference request.

gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:5001 --timeout 600

There is no more out-of-memory error ✅

Conclusion

We tried over and over, and we succeeded in getting a different error each time, but we learned key information. The key information is you always get the versions. Packages, frameworks, etc., depending on version numbers is each other.

References

Credits

Burak Canbaz