Docker + NVIDIA GPU = nvidia-docker

Portable Deep Learning Environments

Published in

Veritable

6 min readSep 13, 2017

Update on 2018-02-10: nvidia-docker 2.0 has been released and 1.0 has been deprecated. Check the wiki for more info.

(For those who are not familiar with Docker, you can start by checking out the official introduction.)

Docker containers wrap a piece of software in a complete filesystem that contains everything needed to run: code, runtime, system tools, system libraries — anything that can be installed on a server. This guarantees that the software will always run the same, regardless of its environment.

Docker is a wonderful technology. In my opinion, every reasonably Python(what I am most familiar with) project that requires teamwork should have at least one Dockerfile, preferably with application-level configuration files(e.g. Docker Compose files). Life before Docker was full of the infamous “It works on my machine“ problems. I get to experience that nightmare again recently. When I asked someone from another team what packages are needed to run their code, he handed my a dump of pip freeze output from his machine…

(Of course, docker has its problems as well. But they so far do not bother me too much.)

Docker + GPU

Docker virtualizes CPU natively. CPU resource should automatically available to you inside the container. You can even allocate CPU resource with docker run parameters (e.g. --cpus=<value>). Not so easy for GPU. GPU usually requires specialized(often proprietary) drivers to run inside the container.

For NVIDIA GPUs, one the early solutions is to fully install the driver inside the container. The problem of this solution is that the driver version inside the container must exactly match the driver version on the host machine. That means whenever I upgrade the driver on the host machine, I must rebuild every Docker images that uses GPU (not to mention it’s not really straightforward to install driver inside containers). I gave up that solution very quickly and settled with miniconda to manage my deep learning packages. It caused some regression, since previously I had mostly switched from virtualenvwrapper to Docker containers for managing Python development environment.

NVIDIA has been developing another solution since late 2015. Recently I’ve noticed open-sourced deep learning implementations are starting to come with docker images, and a PaaS provider seems to build the entire service around GPU-enabled docker images. It seems to me that the new solution has become production-ready, so it’s a good time to give it a try.

NVIDIA-DOCKER

The solution provided by nvidia-docker project consists of two parts:

Make the images agnostic of the NVIDIA driver
An alternative Docker CLI to automatically detect and setup GPU containers leveraging NVIDIA hardware.

For typical deep learning applications, you no longer need to install the CUDA toolkit on your host machine. Instead, you only need to install the driver. The images provided by nvidia-docker will work with any compatible drivers, thus making the image/container truly portable:

Minimum driver version and GPU architecture for each CUDA version (Source)

Installation

The project documentation should be clear enough. The most easy way is to install the binary package for your platform(.deb for Ubuntu/Debian, .rpm for CentOS/Fedora). If you’re using Debian-based systems, this post might help you. Newer versions of Ubuntu/Linux Mint also offer a GUI driver manager to help you switch to proprietary drivers with a few clicks.

Picking An Image to Use

An example: nvidia/cuda:8.0-cudnn5-runtime-ubuntu16.04

8.0: CUDA version
cudnn5: This image comes with cuDNN 5
runtime/devel: if you’re not build deep learning libraries from source, more lightweight runtime usually suffices.
ubuntu16.04: OS version. Other options include ubuntu14.04 centos6 centos7

Testing If It Works

Two simple commands to check if GPU is correctly set up inside the container:

nvidia-docker run --rm nvidia/cuda nvidia-smi : the output should contain info of your host driver
nvidia-docker run --rm nvidia/cuda nvcc --version : this should output the CUDA version (only available for devel images)

(Substitute nvidia/cuda with the name of your image.)

A Practical Example: An Image for fast.ai Courses

Anurag Goel has already created a docker project for fast.ai deep learning course part 1. I’ve made some changes to make it compatible with part 2 as well, along with some tweaks based on my personal preference. The gist is:

Use Python 3.6 instead of 2.7
Use the latest version of miniconda
Install PyTorch 0.1.12
Make Keras 1.1.2 use Tensorflow-gpu 0.12.1 as the backend by default ( no CPU support)
Use pillow-simd instead of pillow
Remove password setting in jupyter_notebook_config.py. You need to copy the token printed on screen after launching jupyter notebook to access the web interface.
Remove lines installing tini, since it has been included in Docker 1.13 or greater. (This example assumes you are using the latest version of Docker.)

Here are some snippets from the Dockerfile with explanations:

RUN useradd --create-home -s /bin/bash --no-user-group -u $USERID $USERNAME && \
    chown $USERNAME $CONDA_DIR -R && \
    adduser $USERNAME sudo && \
    echo "$USERNAME ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers

This create a user inside the container, and grant it sudo privilege without password. If you are the only one using the machine, this usually should just work. If not, look up your user id using id command and change ARG USERID=<your userid> in the Dockerfile.

RUN conda install -y --quiet python=$PYTHON_VERSION && \
  conda install -y --quiet jupyter h5py ipywidgets scikit-learn \
  matplotlib pandas bcolz sympy scikit-image mkl-service && \
  conda install pytorch=0.1.12 torchvision cuda80 -c soumith && \
  conda clean -tipsy

This installs the specified python version, and a bunch of other packages via conda. Note that it explicitly specifies the version of PyTorch.

RUN  pip install --upgrade pip && \
  pip install tensorflow-gpu==0.12.1 kaggle-cli pillow-simd xgboost && \
  pip install git+git://github.com/fchollet/keras.git@1.1.2 && \
  pip install nltk gensim keras-tqdm

Install the remaining packages via pip. Note the explicit version specifications of Tensorflow and Keras. The version of Keras used by fast.ai courses is quite old, and you need to be extra careful not to install the wrong version. With Docker, you just need to get it right once and docker build the entire environment anytime knowing it should work out of the box.

Creating an container

nvidia-docker run -p 8888:8888 --init -ti --name fastai \
                  ceshine/cuda-fastai

The image is rather big to download. You can also choose to build it yourself locally with docker build command. The Dockerfile is located here.

I’ve tested it on three of the notebooks in the part 2 course. The settings seem to be correct. If I missed some package, fret not! You can install it yourself. Use docker exec -ti fastai bash to enter the container, and use pip install <package here> to install whatever is missing.

Conclusion

So far nvidia-docker solution has been awesome. I’m really glad I finally invested some time figuring out how to use it. There’s more to it than running a container locally with a single GPU. You can specify which GPUs to use, running it remotely. There also is a REST API. They are left for you to explore. Thanks for reading!