Setup a lightweight environment for deep learning
In this article, I would like to share about my experiences while setting up the environment for our deep learning project. It may be quite complicate at first, but not really. At the end, we will have a lightweight system based on Ubuntu 17.10, with CUDA 9.0, cuDNN 7.0.5, Python 3, TensorFlow-GPU and Jupyter Notebook already to start training.
1, Hardware
For a small project, we just need a set as below:
- Intel(R) Core(TM) i5–7600 CPU @ 3.50GHz
- 240 GB hard drive (SSD)
- 8 GB RAM (DDR4)
- nVidia GP106 [GeForce GTX 1060 6GB]
Of course, it also requires a case, power supply, keyboard, mouse and monitor. Total cost about $1500.
2, OS & platform
In this machine we install:
- Ubuntu 17.10 “Artful Aardvark” 58MB
- Xubuntu minimal
Note that the 240GB of SSD drive is being separated to 3 parts:
- 4 GB for swap
- 80 GB mounted as
/storage
to store persistent data - The rest mounted as
/
to install Ubuntu
After the system is ready, we login as root and run the following commands to install several useful tools:
sudo apt update
sudo apt install — no-install-recommends -y \
software-properties-common build-essential \
make curl wget \
ccze inetutils-tools \
python-minimal git nginx htop vim
While other libs are quite familiar, ccze
may be strange. It is used to color the logs output with journalctl
.
Lastly, we chmod storage to share to all users:
sudo chmod 0777 /storage
3. Python and Pip
We love to work with Python 3 only, but some system libs may require Python 2. That’s why we have installed python-minimal
, then we simply forget it.
The following script will install Python v3.6.4 from source:
export PYTHON_VERSION=3.6.4
export PYTHON_DOWNLOAD_URL=https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgzsudo apt install --no-install-recommends -y libssl-dev libreadline-dev libbz2-dev libsqlite3-devwget "$PYTHON_DOWNLOAD_URL" -O python.tar.tgz
tar -zxvf python.tar.tgz
cd Python-$PYTHON_VERSION
./configure --enable-optimizations --enable-loadable-sqlite-extensions
make
sudo make installpip3 install --upgrade pip
The libs libbz2-dev
, libsqlite3-dev
, etc are required to get later tools such as Jupyter or TensorBoard stability work. Ignoring them would cause the unpleasant errors.
When we build Python 3 from source, pip3 is also installed too. It’s good to put these lines into ~/.bash_aliases
or ~/.bash_profile
:
alias python=python3
alias pip=pip3
To remember, the general rule is:
~/.bash_profile
is being activated just one time when you login (GUI or SSH)~/.bash_aliases
is being activated every time when you open the terminal (window or tab)
However this behavior can be changed by modifying ~/.bashrc
, ~/.profile
, or /etc/bash.bashrc
, etc.
4. NVIDIA driver
There is two available versions for NVIDIA graphic card’s driver: Nouveau driver and Nvidia driver. The first one is open source, by community. The last one is close source, by NVIDIA.
Normally, Nvidia driver is default. For Ubuntu 17.1 0, it’s nvidia-384
. We can check it with:
cat /proc/driver/nvidia/version
If it’s not there for some reason, just install it.
From GUI, you can choose it via Drivers Management tool. It will be downloaded and installed automatically.
You can install Nvidia driver via terminal too. For this case, many experts suggest to add Nouveau to blacklist first:
sudo nano /etc/modprobe.d/blacklist.conf
Then paste the following lines into then save it:
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
And install:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install nvidia-384 nvidia-384-dev
Recheck it using the above cat
command or nvidia-smi
for more detail.
5. CUDA v9.0
TensorFlow team just released v1.7 that has been built with CUDA 9.0, so unless you have plan to build TensorFlow from source, you should not install CUDA v9.1 to avoid the unexpected issues.
IMHO, it’s always best practice to install pip modules into the virtual environments, and use TensorFlow from PyPI. This will provide a flexible solution. For the same reason, I didn’t recommend to use Anaconda.
CUDA v9.0 requires GCC 6, while default GCC version in Ubuntu 17.10 is GCC 7.2. So we have to install GCC 6 and create symlinks as below:
sudo apt install gcc-6 g++-6
sudo ln -s /usr/bin/gcc-6 /usr/local/cuda/bin/gcc
sudo ln -s /usr/bin/g++-6 /usr/local/cuda/bin/g++
Now gcc
command is running as gcc-6, check it with:
gcc -v
Then, we stop x-server, download CUDA 9 and install it:
sudo service lightdm stop
wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run
mv cuda_9.0.176_384.81_linux-run cuda_9.0.176_384.81_linux.run
chmod +x cuda_9.0.176_384.81_linux.run
sudo ./cuda_9.0.176_384.81_linux.run --override --dkms -s
While compiling, it will ask several questions, answer as below:
You are attempting to install on an unsupported configuration. Do you wish to continue?
y
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
n
Install the CUDA 9.0 Toolkit?
y
Enter Toolkit Location
[default location]
Do you want to install a symbolic link at /usr/local/cuda?
y
Install the CUDA 9.0 Samples?
y
Enter CUDA Samples Location
[default location]
If nothing special happens, the process will end succefully.
As NVIDIA’s docs, we may need to add these paths into ~/.bash_aliases
:
$ export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}$ export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Lastly, reboot the system.
6. cuDNN v7.0.5 for CUDA 9.0
CUDA 9.0 only plays with its appropriate cuDNN version, you can download it here after joining NVIDIA Developer Program.
Choose the correct item from list as below:
Download it then run these commands:
tar -xzvf cudnn-9.0-linux-x64-v7.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h
/usr/local/cuda/lib64/libcudnn*
Basically it is done.
Now we talk a little about the workspace.
7. Setup project environment
Depending on the project, the process and the team, you can choose the corresponding way to organize the workspace.
In our project, after finishing the above steps, we give each project member an account to access to the system as regular user.
Note that we have to ensure the paths at the step 5 are available to all users. Simply clone the ~/.bash_aliases
.
Persistent data such as datasets, checkpoints, weights, etc can be stored at /storage
.
Project member will login using ssh
and setup virtual environment by himself, for example:
python3 -m venv computer-vision
source computer-vision/bin/activate
(computer-vision) pip install tensorflow-gpu jupyter
(computer-vision) jupyter notebook --port 7777
With regular user permission, they can do everything related to preprocessing and training, but could not install pip package globally or change system softwares. That will keep the system more stable.
While using GPU, nvidia-smi
is powerful command. We can check the real-time stats with:
watch -d -n 1.0 nvidia-smi
Conclution
That’s all. Now we had a good enough environment to get started our deep learning tasks.
The whole script is available here:
In addition, there are some free places to play such as Google Colab and FloydHub. Let’s take a look on them while you considering to invest.
Enjoy studying.