Basic Hygiene For Creating Containers That needs GPU ?
We are living in the era of “Containers”. For many benefits containers and dockers are most preferred method of deployments. Same benefits applies to Deep Learning as well which has special nuance that your container should be able to access the underlying GPU Hardware. Nvidia has created NGC (Nvidia Container Environment) as an abstraction over docker, So we can execute such containers easily !! But creating such docker images and container is tricky and often leads to errors and failed execution of container.
In this blog, We will discuss one such situation and a basic hygiene for creating docker containers that needs GPU’s.
Nvidia container environment has hard dependencies on the version of Cuda and cudNN on the host machine which is running the docker container. In this post we will not discuss what cuda and cudnn is but briefly:
Cuda is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
And cuDNN is a Cuda Deep neural network library which is accelerated on GPU’s. It’s built on underlying Cuda framework.
If there is a version mismatch your container will be created but you won’t be able to run any application that uses cudnn because of the error:
Reason for this error is version mismatch between cuda and cuDNN between Host Computer that is running container and container’s own cuda and cuDNN environment.
One thing to note here is the base image you selected in the docker file will determine the cuda and cudnn version that will be installed inside the container.
First of all we need to find the version of cuDNN and cuda version on host Computer so that we can select the correct image to build the docker container to avoid version conflicts.
Steps to find CuDNN version on the Host Computer:
whereis cudnn_version.h
We can see the CuDNN version is located at /usr/include/cudnn_version.h
Copy the location and do a more on file and scroll to find exact version of cudDNN.
more /usr/include/cudnn_version.h
So for my machine exact version of CuDNN is 8.1.1
Now we need to find the exact version of the Cuda, Which can be done easily by nvida-smi or nvcc or issuing command whereis cuda_version.h
So version of Cuda on my machine is 11.4.0.
Now to build a docker container with nvidia run time to access GPU inside the container, We need to find the base image that has matching Cuda and CuDNN versions on our host machine.
cuda = 11.4
cuDNN = 8
(replace these numbers with versions on your computer).
Base images can be searched over below link:
Now, We need to go back and update Dockerfile to include the correct base image and we are good to go.
In below example I am using base image as cuda:11.4.0-cudnn8-devel-ubuntu20.04
Then build the image and create a docker container and successfully run it !!
So, While creating the docker images for your containers that needs GPU’s please ensure:
1. Check the Cuda and cuDNN version on your Target Machine on which your container will execute.
2. In NGC find the base image that uses exactly same versions of cuda and cuDNN and use that as a base image.
3. Create the base image and run the containers !!
Good Luck !!