GPU Machine Learning on Linux

By Erik Jan de Vries for BigData Republic

Many popular machine learning algorithms can enjoy great speed improvements if they are run on a GPU. In this blog I will discuss setting up your Linux system for GPU powered machine learning using Nvidia-Docker. Hopefully it will help you avoid some of the frustrations I faced while setting up my system.

This blog post is part of a larger series:

My laptop is an MSI GS63VR laptop, with Nvidia Geforce 1070 GPU and an Intel GPU integrated onto the CPU. My goal is to use the Nvidia GPU for machine learning, while using the integrated Intel GPU for displaying the desktop. In addition, I would like to use Docker containers for running my machine learning algorithms, so as the foundations for my machine learning workstation I am going to set up Nvidia-Docker to allow my Docker containers to run algorithms on the GPU using CUDA.

GPU drivers

By default, Linux distributes the Nouveau drivers for the Nvidia GPU. We will have to take the following steps:

  1. Blacklist the Nouveau drivers, so that Linux does not use them.
  2. Install drivers for the Intel GPU.
  3. Install the Nvidia drivers, but without configuring the X server to display the desktop with them.

Finally we would like to check that everything works as we intended.

Blacklist the Nouveau drivers

Insert

blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off

into /etc/modprobe.d/blacklist.conf.

Install Intel drivers

I’m not sure this is strictly required, but I installed the mesa video drivers

sudo apt-get install mesa-utils

and I read that many users benefited from installing specific Intel microcode:

sudo apt-get install intel-microcode

Perhaps not necessary, but I installed this somewhere along the way as well.

Nvidia

The next step is to install the Nvidia drivers. There are several ways to do this, but to make things work, I had to download the installer manually (not using the Ubuntu package manager) and install it with the command line option --no-opengl-files. I've installed version 390.42 of the drivers:

sudo ./NVIDIA-Linux-x86_64-390.42.run --no-opengl-files

During the installation, ignore the error message about a failing pre-install script (that script seems to call only exit 1 to raise an error, so that you must consciously install your own driver instead of using the kernel drivers — but have a look at the warning below about Automatic Updates), and make sure you do not update the X configuration when asked, so that Intel drivers remain in use for the display.

Check the installation

When you’ve installed the drivers, you can check your installation with

lspci -k | grep -EA3 'VGA|3D|Display'

The output should include something like Kernel driver in use: i915 for your Intel GPU, and something like Kernel driver in use: nvidia for your Nvidia GPU.

If you start the NVIDIA X Server Settings from the start menu, you should get an error message indicating that the Nvidia drivers are not in use (for displaying the desktop).

Cuda

I have not installed Cuda, since I intend to use Nvidia-Docker images which should include the relevant Cuda libraries.

Nvidia-Docker

I have used the regular installation instructions for Docker CE and Nvidia-Docker.

Keep in mind: if you plan to use Docker a lot, it can easily take up a lot of storage. By default Docker uses /var/lib/docker for storage, which in my case falls under the root / partition, so either make sure your partition is large enough, or get ready to move the Docker storage to another partition.

To test if everything works well, you can run

nvidia-docker run --rm nvidia/cuda nvidia-smi

and you should see the usage statistics for your Nvidia GPU. If you like to run TensorFlow in a Docker container using your GPU, make sure to have a look at the Docker-image gcr.io/tensorflow/tensorflow:latest-gpu-py3.

For using Docker containers in a production setting, I would recommend having a look at Kubernetes, an open-source system for automating deployment, scaling, and management of containerized applications.

Automatic updates

One final consideration: in certain situations, you may want to turn off automatic updates. When my system auto-updated, it installed a new kernel version, which meant that Docker could not find the Nvidia drivers anymore. After updating your kernel, you’ll have to blacklist the Nouveau driver again and reinstall the Nvidia driver, as described above.

Warning: disabling automatic updates also prevents your system from periodically installing security updates, so make sure you regularly check for and install updates yourself!

To turn off automatic updates:

  • Open Discover (Software Center) from the start menu
  • In the menu, open “Advanced…” / “Configure Software Sources”
  • On the tap “Updates” under “Automatic Updates” change the settings to, for example, “Only notify about available updates”.

If you decide to turn off automatic updates, you should regularly install updates manually:

sudo apt-get update
sudo apt-get upgrade

This way you're always in control of your system and you won't be surprised by Nvidia-Docker images not starting.


About the author

Erik Jan de Vries is a data scientist at BigData Republic, a data science consultancy company in the Netherlands. We hire the best of the best in BigData Science and BigData Engineering. If you are interested in deploying machine learning and deep learning solutions in production using Docker and Kubernetes, feel free to contact us at info@bigdatarepublic.nl.