Setting Up a Machine for Deep Learning

Hi there! Here is a simple tutorial to get you started with developing Deep Learning projects. Working on deep learning might be quite intimidating — especially, where to start. I have spent plenty of my time building and setting up machines for Deep Learning. But each time I felt like I needed a simple reference that I could follow. I tried to create that simple reference and share it with you.

What one fool can do, another can. — Ancient Simian Proverb

Getting your hands dirty is the best way to learn, as is also the case for Deep Learning. So, I always disliked the things that get in the way. Such as making configurations, and setting things up. There is a great barrier for getting started with working on Deep Learning projects. Besides, we should be focusing on the ideas, not the environment. This is very important especially for research.

Having an idea is some thing, but you have to prove it and you need solid results for that. We should be able to go from the idea to results as easy as possible. We may fail but we will be able to re-evaluate our idea and finally see whether it works or not.

Now, let’s get started. We will configure a machine from the ground up for Deep Learning, and in the end run a simple neural network on it.

The Machine

I had a SuperMicro computer which had 3 Nvidia GTX 1080 GPUs with Pascal architecture. This workstation came with all the hardware installed such as GPUs and hard disks. So, there was no need for extra installation. Yet, there was one free slot so I installed another GTX 1080 GPU on it, but it doesn’t matter how many GPUs you have. As long as you have at least one GPU, this tutorial will get you to the end. We will be using Ubuntu 16.04. But I won’t go into detail how to install it as we will focus more on building our first deep neural network from scratch.

The Basics

Matrix multiplication is important. Especially if you are working on deep learning, almost everything relies on matrix multiplications.

So, let’s start with the basics, and build on top of it. BLAS substantially improves matrix multiplications. Here, we will use OpenBLAS and make some tests with NumPy. Now open the terminal (you can do this by pressing Ctrl + Alt + T at the same time) and paste the following lines one at a time (without copying the $) by pressing Ctrl + Shift + V into the terminal.

$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install build-essential cmake cmake g++ gfortran unzip wget pkg-config openjdk-8-jdk git python-dev python3-dev python-pip python3-pip swig python-wheel libcurl3-dev
$ sudo apt-get install libopenblas-dev liblapack-dev
$ sudo apt-get install python-numpy python3-numpy

We installed a few packages and NumPy with OpenBLAS support, which would make things a bit faster when we run our network. Now let’s see if we did everything right.

Here is a simple script to run some tests. You can as well copy it from below.

You need to save it to your computer, in your working directory. Run the following command to see where your working directory is.

$ pwd

This should print the path you are currently working on. Once you save the test script in this directory, we can start running simple tests.

$ OMP_NUM_THREADS=1 python

The above command should print a few results like the following. But we are not looking for exact numbers.

NumPy Version: 1.11.0
Max int: 9223372036854775807
BLAS info:
* libraries [‘openblas’, ‘openblas’]
* library_dirs [‘/usr/lib’]
* define_macros [(‘HAVE_CBLAS’, None)]
* language c
Dot product of two (1000,1000) matrices took 82.5 ms
Dot product of two (2000) dimensional vectors took 2.62 us
SVD of (1000,2000) matrix took 1.194 s
Eigen decomposion of (1000,1000) matrix took 1.370 s

Let’s run the same experiment, this time with more threads.

$ OMP_NUM_THREADS=8 python

So far so good. The results should take less time compared to our previous test.

NumPy Version: 1.11.0
Max int: 9223372036854775807
BLAS info:
* libraries [‘openblas’, ‘openblas’]
* library_dirs [‘/usr/lib’]
* define_macros [(‘HAVE_CBLAS’, None)]
* language c
Dot product of two (1000,1000) matrices took 28.4 ms
Dot product of two (2000) dimensional vectors took 2.59 us
SVD of (1000,2000) matrix took 0.698 s
Eigen decomposion of (1000,1000) matrix took 1.219 s

Now that we have the basic setup, we can move forward to making use of our GPU(s). We haven’t talked much about it before but GPUs are a lot more powerful compared to CPUs as they have many cores. Thus, it is very important to configure your GPUs to get the full performance.

Nvidia Drivers

We must have the latest Nvidia drivers installed. But before that let’s first find our graphic card model(s).

$ lspci | grep -i nvidia

This should list the model(s) of your graphic card(s). Go to the Nvidia driver website and find the latest stable drivers for your graphics card model(s) based on your OS. Next step is to add the “Proprietary GPU Drivers” PPA repository. At the time of this writing, the latest stable driver version for my GPU(s) was 378. But, for you it might be different from this version. If this is the case, you should replace the 378 in the following commands with the newer version.

$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt-get update
$ sudo apt-get install nvidia-378

We should now restart our computer. If we did everything right, the computer should boot up fine.

$ sudo shutdown -r now

We are almost there, but let’s make sure if we set up our Nvidia Drivers right. Now let’s run the following command in the terminal.

$ cat /proc/driver/nvidia/version

Here we should see the correct version of the driver we intended to install.


CUDA enables running our models on the GPUs, so we need to install it. Otherwise, we would restrict our models to run only on CPUs. Now go to CUDA web site and download the correct version of CUDA based on your system. But don’t forget to come back here, we are not finished yet.

Now, the name of the file should look like the following. But feel free to change it based on your settings, which I marked as bold.

$ sudo dpkg -i cuda-repo-ubuntu1604*amd64.deb
$ sudo apt-get update
$ sudo apt-get install cuda

If we could install CUDA right, there should be a directory in the following path:


We should now add CUDA to the environment variables.

$ echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
$ echo 'export LD_LIBRARY_PATH=
"$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"' >> ~/.bashrc
$ echo 'export
CUDA_HOME=/usr/local/cuda' >> ~/.bashrc
$ source ~/.bashrc

Let’s make sure that everything is correct before we move forward.

$ nvcc -V

This should print out the CUDA compiler driver information and its version. Let’s restart our machine one more time.

$ sudo shutdown -r now


So far we finished installing CUDA. Now we are going to install cuDNN. Which is a library to speed up Deep Learning frameworks, such as Theano and Tensorflow. Thus, if you skip this step it is not the end of the world. But it will help us a lot once we get to train very deep networks, so let’s stick for a while and complete the following steps. Besides, we don’t have much left to do.

First, we should register for Nvidia’s Accelerated Computing Developer Program. Once you register, you should login and download the latest cuDNN. Current version v5.1 but you can change it to a newer version which I marked as bold in the following line. Next step is to extract the cuDNN contents into /usr/local/cuda and for this to happen we execute the following commands.

$ sudo tar -xzvf cudnn-8.0-linux-x64-v5.1.tgz
$ sudo cp -P cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

We are now done. Let’s see how we did.

$ nvidia-smi

You should now be able to see all your GPUs here. If not, then something might have gone wrong. By the way, this command is very handy incase you want to check the performance metrics of your GPU(s).

Here, I can see all my GPUs which means we can now move forward.


virtualenv is a tool which we will use to create isolated Python environments. It allows many Python projects to coexist in the same machine.

$ sudo pip install virtualenv


virtualenvwrapper is a set of extensions to the virtualenv tool. It makes working with virtual environments very easy.

$ sudo pip install virtualenvwrapper

Now lets create a directory for our virtual environments.

$ cd ~/
$ mkdir Envs

We are now going to update our bash file again.

$ echo 'export WORKON_HOME=$HOME/.virtualenvs' >> ~/.bashrc
$ echo 'export PROJECT_HOME=$HOME/Envs' >> ~/.bashrc
$ source /usr/local/bin/

Let’s create a virtual environment and name it playground. As it will be our playground :D You can create more environments and name them as you wish.

$ mkvirtualenv playground

Anytime you want to work on an environment, type the below command in the terminal. Of course if the name of the environment is different, then you should be using it.

$ workon playground


It is easy to install Tensorflow. Let’s first try to install it via pip.

$ pip install --upgrade tensorflow-gpu

If this fails, you can go to Tensorflow website and select the correct binary for your system. In this case we select the GPU enabled Ubuntu version. For my configuration it is Ubuntu/Linux 64-bit, GPU enabled, Python 2.7 but yours might be different. Make sure you select the latest version from the Tensorflow website.

$ pip install --upgrade

After the installation, let’s make sure everything is OK.

$ python
>>> import tensorflow
>>> tf.__version__
>>> exit()

You can also test it with this script or copy it from below as well to see the tasks run on GPU(s).


Keras is a high-level neural networks library written in Python. You can run your code in Keras on top of either TensorFlow or Theano. Which makes Keras a a good choice for Deep Learning models. So let’s install a few dependencies first.

$ pip install numpy scipy
$ pip install scikit-learn
$ pip install pillow
$ pip install h5py

Now let’s install Keras via pip.

$ pip install keras

That’s actually it. But let’s make sure everything is OK. This time for Keras.

$ python
>>> import keras
>>> keras.__version__
>>> exit()

This should print the Keras version and the backend Keras uses. We can change the backend from the configuration file of Keras incase it is not set to Tensorflow by default.

$ gedit ~/.keras/keras.json

The contents of this file should look something like this for Tensorflow. If this is not the case you can change it based on the configuration below and save the file.

"image_dim_ordering": "tf",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"

The Network

Now that we set up our machine for deep learning let’s build a simple neural network to solve the famous XOR problem. You can download the script from here or copy it from below as well.

$ python

This will create a neural network to solve the XOR problem and start training it for 1000 epochs. You can track the performance of the GPU(s) with the following command. But make sure to do this in another terminal window, which will refresh the screen for every second.

$ watch -n1 nvidia-smi


This is the end of our tutorial. You can now build more complex networks and start experimenting with them using this set up. Take care!