I am a PhD candidate in theoretical astrophysics based in Berlin, Germany. I investigate the interaction between small embedded gravitational perturbers such as moons or proto planets in thin cold cosmic disks like planetary rings or proto planetary disks. My other passion is artificial intelligence and machine learning. Recently I built a computer dedicated to Deep Learning in order to be able to train deeper models than on my MacBook.
I decided to buy a GTX 1080 Ti from EVGA. Titan Xp is slightly better but also significantly more expensive.
Given current GPU and RAM prices I considered buying a pre-built or used gaming PC. However many current gaming CPUs such as i7 8700k often found in these systems offer only 16 PCI Express lanes! This is a significant constraint as it allows you only to install a single GPU with 16 PCIe lanes. It was important to me to have the ability to install a second GPU later on. I therefore chose an i7 6850k which offers 40 PCIe lanes supporting two GPUs each connected to 16 lanes. I chose an ASRock X99 Taichi Motherboard, 32 gb of RAM and a M.2 SSD from Samsung.
After a few hours of trying to get tensorflow-gpu, theano, pytorch etc. running I figured out a pretty straightforward way to do it starting from a freshly installed Ubuntu 16.04. My procedure is a combination of the following tutorials (1,2) and the tensorflow documentation. Thanks a lot to the authors of those articles, you helped a lot. Opposed to these two articles the procedure presented in this article does not require configuring tensorflow from source, nor using “not officially supported” drivers.
I recommend starting with a fresh installation of ubuntu 16.04. First, make sure that everything is up to date
sudo apt-get update
sudo apt-get upgrade
Install git since we will use it later on:
sudo apt-get install git
Verify that your compatible GPU is found:
lspci | grep -i nvidia
If you don’t see any settings try update-pciids first.
Step 2: Installing CUDA
Download the nvidia cuda toolkit. I chose version 9.0 since 9.1 currently requires manually configuring tensorflow from source…
… and proceed to install it:
sudo apt-key adv — fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
sudo dpkg -i cuda-repo-ubuntu1604_9.1.85–1_amd64.deb
sudo apt-get update
sudo apt-get install cuda-9.0
Reboot the system to load the drivers and add cuda to the path:
Add the following two lines at the end of the file ~/.bashrc
And type the following into your terminal:
Your output should include the version of the driver that is used (currently 390):
NVIDIA-SMI 390.30 Driver Version: 390.30
Step 3: Installing cuDNN
Create a free account on https://developer.nvidia.com/cudnn and select: Download cuDNN v7.1.2 (Mar 21, 2018), for CUDA 9.0
Download these three files:
cuDNN v7.1.2 Runtime Library for Ubuntu16.04 (Deb)
cuDNN v7.1.2 Developer Library for Ubuntu16.04 (Deb)
cuDNN v7.1.2 Code Samples and User Guide for Ubuntu16.04 (Deb)
Next, cd into the Download folder and type the following to install the packages:
sudo dpkg -i libcudnn7_220.127.116.11–1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7-dev_18.104.22.168–1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7-doc_22.214.171.124–1+cuda9.0_amd64.deb
Let us verify the installation of cuDNN
cp -r /usr/src/cudnn_samples_v7/ $HOME
make clean && make
If it says Test passed! you have CUDA and cuDNN successfully installed on your machine.
Step 4: Download conda and create an environment for Deep Learning
Next I will show you how to install Python and the gpu version of tensorflow.
First, download and install anaconda:
You have to agree to the license agreement and confirm the location of the installation. Next, upgrade conda:
conda upgrade -y — all
Important note: Medium.com shows the two minuses in front of “all” in a strange way. It is supposed to read “-y space minusminusall”.
Since I intend to use this machine for my work for PhD in theoretical astrophysics as well, i will create an environment for Deep Learning in order to keep things separated.
Before creating the environments, I install nb_conda_kernels so that i can later choose the respective kernel I need in the jupyter notebook.
conda install nb_conda_kernels
Create the environment with the following command and proceed to activate it:
conda create -n deeplearning pip python=3.6 ipykernel
source activate deeplearning
In the activated environment use pip to install the gpu version of tensorflow:
Let us check whether the installation was successfull by training a fully connected network on the mnist data set:
The output should include a similar line…
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10124 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
…and the loss should be decreasing:
Step 0: loss = 2.30 (0.328 sec)
Step 100: loss = 2.16 (0.001 sec)
Step 200: loss = 1.94 (0.001 sec)
Step 300: loss = 1.58 (0.002 sec)
Step 400: loss = 1.32 (0.002 sec)
Step 500: loss = 1.01 (0.001 sec)
With tensorflow installed and the gpu working, we still have to install scikit-learn, pytorch, keras, scikit-learn, opencv and theano:
conda install scikit-learn
conda install pytorch torchvision cuda80 -c soumith
pip install keras
pip install opencv-contrib-python
conda install theano
When trying to import theano in python I got an error that i could solve by running conda install mkl-service.
I hope that I could help you set up your Deep Learning machine. If you need help at a certain point, leave me a comment :)