Docker + NVIDIA GPU = nvidia-docker
Portable Deep Learning Environments
Update on 2018-02-10: nvidia-docker 2.0 has been released and 1.0 has been deprecated. Check the wiki for more info.
(For those who are not familiar with Docker, you can start by checking out the official introduction.)
Docker containers wrap a piece of software in a complete filesystem that contains everything needed to run: code, runtime, system tools, system libraries — anything that can be installed on a server. This guarantees that the software will always run the same, regardless of its environment.
Docker is a wonderful technology. In my opinion, every reasonably Python(what I am most familiar with) project that requires teamwork should have at least one Dockerfile, preferably with application-level configuration files(e.g. Docker Compose files). Life before Docker was full of the infamous “It works on my machine“ problems. I get to experience that nightmare again recently. When I asked someone from another team what packages are needed to run their code, he handed my a dump of
pip freeze output from his machine…
(Of course, docker has its problems as well. But they so far do not bother me too much.)
Docker + GPU
Docker virtualizes CPU natively. CPU resource should automatically available to you inside the container. You can even allocate CPU resource with
docker run parameters (e.g.
--cpus=<value>). Not so easy for GPU. GPU usually requires specialized(often proprietary) drivers to run inside the container.
For NVIDIA GPUs, one the early solutions is to fully install the driver inside the container. The problem of this solution is that the driver version inside the container must exactly match the driver version on the host machine. That means whenever I upgrade the driver on the host machine, I must rebuild every Docker images that uses GPU (not to mention it’s not really straightforward to install driver inside containers). I gave up that solution very quickly and settled with miniconda to manage my deep learning packages. It caused some regression, since previously I had mostly switched from virtualenvwrapper to Docker containers for managing Python development environment.
NVIDIA has been developing another solution since late 2015. Recently I’ve noticed open-sourced deep learning implementations are starting to come with docker images, and a PaaS provider seems to build the entire service around GPU-enabled docker images. It seems to me that the new solution has become production-ready, so it’s a good time to give it a try.
The solution provided by nvidia-docker project consists of two parts:
- Make the images agnostic of the NVIDIA driver
- An alternative Docker CLI to automatically detect and setup GPU containers leveraging NVIDIA hardware.
For typical deep learning applications, you no longer need to install the CUDA toolkit on your host machine. Instead, you only need to install the driver. The images provided by nvidia-docker will work with any compatible drivers, thus making the image/container truly portable:
The project documentation should be clear enough. The most easy way is to install the binary package for your platform(.deb for Ubuntu/Debian, .rpm for CentOS/Fedora). If you’re using Debian-based systems, this post might help you. Newer versions of Ubuntu/Linux Mint also offer a GUI driver manager to help you switch to proprietary drivers with a few clicks.
Picking An Image to Use
8.0: CUDA version
cudnn5: This image comes with cuDNN 5
runtime/devel: if you’re not build deep learning libraries from source, more lightweight
ubuntu16.04: OS version. Other options include
Testing If It Works
Two simple commands to check if GPU is correctly set up inside the container:
nvidia-docker run --rm nvidia/cuda nvidia-smi: the output should contain info of your host driver
nvidia-docker run --rm nvidia/cuda nvcc --version: this should output the CUDA version (only available for
nvidia/cuda with the name of your image.)
A Practical Example: An Image for fast.ai Courses
Anurag Goel has already created a docker project for fast.ai deep learning course part 1. I’ve made some changes to make it compatible with part 2 as well, along with some tweaks based on my personal preference. The gist is:
- Use Python 3.6 instead of 2.7
- Use the latest version of miniconda
- Install PyTorch 0.1.12
- Make Keras 1.1.2 use Tensorflow-gpu 0.12.1 as the backend by default ( no CPU support)
- Use pillow-simd instead of pillow
- Remove password setting in jupyter_notebook_config.py. You need to copy the token printed on screen after launching jupyter notebook to access the web interface.
- Remove lines installing tini, since it has been included in Docker 1.13 or greater. (This example assumes you are using the latest version of Docker.)
Here are some snippets from the Dockerfile with explanations:
RUN useradd --create-home -s /bin/bash --no-user-group -u $USERID $USERNAME && \
chown $USERNAME $CONDA_DIR -R && \
adduser $USERNAME sudo && \
echo "$USERNAME ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
This create a user inside the container, and grant it sudo privilege without password. If you are the only one using the machine, this usually should just work. If not, look up your user id using
id command and change
ARG USERID=<your userid> in the Dockerfile.
RUN conda install -y --quiet python=$PYTHON_VERSION && \
conda install -y --quiet jupyter h5py ipywidgets scikit-learn \
matplotlib pandas bcolz sympy scikit-image mkl-service && \
conda install pytorch=0.1.12 torchvision cuda80 -c soumith && \
conda clean -tipsy
This installs the specified python version, and a bunch of other packages via conda. Note that it explicitly specifies the version of PyTorch.
RUN pip install --upgrade pip && \
pip install tensorflow-gpu==0.12.1 kaggle-cli pillow-simd xgboost && \
pip install git+git://email@example.com && \
pip install nltk gensim keras-tqdm
Install the remaining packages via pip. Note the explicit version specifications of Tensorflow and Keras. The version of Keras used by fast.ai courses is quite old, and you need to be extra careful not to install the wrong version. With Docker, you just need to get it right once and
docker build the entire environment anytime knowing it should work out of the box.
Creating an container
nvidia-docker run -p 8888:8888 --init -ti --name fastai \
The image is rather big to download. You can also choose to build it yourself locally with
docker build command. The Dockerfile is located here.
I’ve tested it on three of the notebooks in the part 2 course. The settings seem to be correct. If I missed some package, fret not! You can install it yourself. Use
docker exec -ti fastai bash to enter the container, and use
pip install <package here> to install whatever is missing.
So far nvidia-docker solution has been awesome. I’m really glad I finally invested some time figuring out how to use it. There’s more to it than running a container locally with a single GPU. You can specify which GPUs to use, running it remotely. There also is a REST API. They are left for you to explore. Thanks for reading!