Setup a lightweight environment for deep learning

6 min readApr 8, 2018

In this article, I would like to share about my experiences while setting up the environment for our deep learning project. It may be quite complicate at first, but not really. At the end, we will have a lightweight system based on Ubuntu 17.10, with CUDA 9.0, cuDNN 7.0.5, Python 3, TensorFlow-GPU and Jupyter Notebook already to start training.

1, Hardware

For a small project, we just need a set as below:

Intel(R) Core(TM) i5–7600 CPU @ 3.50GHz
240 GB hard drive (SSD)
8 GB RAM (DDR4)
nVidia GP106 [GeForce GTX 1060 6GB]

Of course, it also requires a case, power supply, keyboard, mouse and monitor. Total cost about $1500.

2, OS & platform

In this machine we install:

Ubuntu 17.10 “Artful Aardvark” 58MB
Xubuntu minimal

Note that the 240GB of SSD drive is being separated to 3 parts:

4 GB for swap
80 GB mounted as /storage to store persistent data
The rest mounted as / to install Ubuntu

After the system is ready, we login as root and run the following commands to install several useful tools:

sudo apt update
sudo apt install — no-install-recommends -y \
 software-properties-common build-essential \
 make curl wget \
 ccze inetutils-tools \
 python-minimal git nginx htop vim

While other libs are quite familiar, ccze may be strange. It is used to color the logs output with journalctl.

Lastly, we chmod storage to share to all users:

sudo chmod 0777 /storage

3. Python and Pip

We love to work with Python 3 only, but some system libs may require Python 2. That’s why we have installed python-minimal, then we simply forget it.

The following script will install Python v3.6.4 from source:

export PYTHON_VERSION=3.6.4
export PYTHON_DOWNLOAD_URL=https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgzsudo apt install --no-install-recommends -y libssl-dev libreadline-dev libbz2-dev libsqlite3-devwget "$PYTHON_DOWNLOAD_URL" -O python.tar.tgz
tar -zxvf python.tar.tgz
cd Python-$PYTHON_VERSION
./configure --enable-optimizations --enable-loadable-sqlite-extensions
make
sudo make installpip3 install --upgrade pip

The libs libbz2-dev, libsqlite3-dev, etc are required to get later tools such as Jupyter or TensorBoard stability work. Ignoring them would cause the unpleasant errors.

When we build Python 3 from source, pip3 is also installed too. It’s good to put these lines into ~/.bash_aliases or ~/.bash_profile:

alias python=python3
alias pip=pip3

To remember, the general rule is:

~/.bash_profile is being activated just one time when you login (GUI or SSH)
~/.bash_aliases is being activated every time when you open the terminal (window or tab)

However this behavior can be changed by modifying ~/.bashrc, ~/.profile, or /etc/bash.bashrc, etc.

4. NVIDIA driver

There is two available versions for NVIDIA graphic card’s driver: Nouveau driver and Nvidia driver. The first one is open source, by community. The last one is close source, by NVIDIA.

Normally, Nvidia driver is default. For Ubuntu 17.1 0, it’s nvidia-384. We can check it with:

cat /proc/driver/nvidia/version

If it’s not there for some reason, just install it.

From GUI, you can choose it via Drivers Management tool. It will be downloaded and installed automatically.

You can install Nvidia driver via terminal too. For this case, many experts suggest to add Nouveau to blacklist first:

sudo nano /etc/modprobe.d/blacklist.conf

Then paste the following lines into then save it:

blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv

And install:

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install nvidia-384 nvidia-384-dev

Recheck it using the above cat command or nvidia-smi for more detail.

5. CUDA v9.0

TensorFlow team just released v1.7 that has been built with CUDA 9.0, so unless you have plan to build TensorFlow from source, you should not install CUDA v9.1 to avoid the unexpected issues.

IMHO, it’s always best practice to install pip modules into the virtual environments, and use TensorFlow from PyPI. This will provide a flexible solution. For the same reason, I didn’t recommend to use Anaconda.

CUDA v9.0 requires GCC 6, while default GCC version in Ubuntu 17.10 is GCC 7.2. So we have to install GCC 6 and create symlinks as below:

sudo apt install gcc-6 g++-6
sudo ln -s /usr/bin/gcc-6 /usr/local/cuda/bin/gcc
sudo ln -s /usr/bin/g++-6 /usr/local/cuda/bin/g++

Now gcc command is running as gcc-6, check it with:

gcc -v

Then, we stop x-server, download CUDA 9 and install it:

sudo service lightdm stop
wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run
mv cuda_9.0.176_384.81_linux-run cuda_9.0.176_384.81_linux.run
chmod +x cuda_9.0.176_384.81_linux.run
sudo ./cuda_9.0.176_384.81_linux.run --override --dkms -s

While compiling, it will ask several questions, answer as below:

You are attempting to install on an unsupported configuration. Do you wish to continue?
y
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
n
Install the CUDA 9.0 Toolkit?
y
Enter Toolkit Location
[default location]
Do you want to install a symbolic link at /usr/local/cuda?
y
Install the CUDA 9.0 Samples?
y
Enter CUDA Samples Location
[default location]

If nothing special happens, the process will end succefully.

As NVIDIA’s docs, we may need to add these paths into ~/.bash_aliases:

$ export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}$ export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Lastly, reboot the system.

6. cuDNN v7.0.5 for CUDA 9.0

CUDA 9.0 only plays with its appropriate cuDNN version, you can download it here after joining NVIDIA Developer Program.

Choose the correct item from list as below:

Download it then run these commands:

tar -xzvf cudnn-9.0-linux-x64-v7.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h
/usr/local/cuda/lib64/libcudnn*

Basically it is done.

Now we talk a little about the workspace.

7. Setup project environment

Depending on the project, the process and the team, you can choose the corresponding way to organize the workspace.

In our project, after finishing the above steps, we give each project member an account to access to the system as regular user.

Note that we have to ensure the paths at the step 5 are available to all users. Simply clone the ~/.bash_aliases.

Persistent data such as datasets, checkpoints, weights, etc can be stored at /storage.

Project member will login using ssh and setup virtual environment by himself, for example:

python3 -m venv computer-vision
source computer-vision/bin/activate
(computer-vision) pip install tensorflow-gpu jupyter
(computer-vision) jupyter notebook --port 7777

With regular user permission, they can do everything related to preprocessing and training, but could not install pip package globally or change system softwares. That will keep the system more stable.

While using GPU, nvidia-smi is powerful command. We can check the real-time stats with: