This is part three of our building a deep learning machine series. You can find the other posts here:
- Building a Deep Learning Box
- GPU Virtualization with KVM / QEMU
- Installing Nvidia, Cuda, CuDNN, TensorFlow and Keras
In this post I will outline how to install the drivers and packages needed to get up and running with TensorFlow’s deep learning framework.
To start, install Ubuntu 14.04 Server. The download link is here. If you are using AWS, or some other cloud virtual machine provider, simply create an instance with Ubuntu 14.04 and ssh into the machine. We’re using Ubuntu 14.04 rather than 16.04 because its supported by Cuda.
Now that we have our server up and running, we’ll need to install the various drivers and packages. Here’s a list of what we’re going to install.
- Nvidia Driver
Confirm GPU Existence
We’re assuming that you’re using a machine which has a GPU installed, more specifically a Nvidia GPU. To find out if your GPU is installed properly and working run the following command.
lspci -nnk | grep -i nvidia4b:00.0 VGA compatible controller : NVIDIA Corporation Device [10de:1b80] (rev a1)
4b:00.1 Audio device : NVIDIA Corporation Device [10de:10f0] (rev a1)
Before we jump into anything, be sure to update apt-get
sudo apt-get update
We also need to ensure gcc is up to date and we have python with pip installed as well as some of the scientfici python packages.
sudo apt-get install libglu1-mesa libxi-dev libxmu-dev -y
sudo apt-get — yes install build-essential
sudo apt-get install python-pip python-dev -y
sudo apt-get install python-numpy python-scipy -y
Install Nvidia Driver
Download the Nvidia driver via wget and run the script in silent mode.
Note: if you’re using a different GPU other than the GTX 1080, you’ll need to download the driver for that specific GPU.
wget http://us.download.nvidia.com/XFree86/Linux-x86_64/367.44/NVIDIA-Linux-x86_64-367.44.runsudo chmod +x NVIDIA-Linux-x86_64-367.35.run
To confirm that the driver was installed correctly and that your GPU is being recognized, run nvidia-smi. This command is also useful if you want to check performance metrics of the GPU.
Cuda allows us to run our TensorFlow models on the GPUs, without it we would be restricted to the CPU. Download the Cuda 7.5 library run file, using wget and install the driver, the toolkit, and samples.
wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda_7.5.18_linux.runsudo chmod +x cuda_7.5.18_linux.run
./cuda_7.5.18_linux.run --driver --silent
./cuda_7.5.18_linux.run --toolkit --silent
./cuda_7.5.18_linux.run --samples --silent
We’ll need to add the cuda library to our system path. Modify the .bashrc file or run these commands.
echo ‘export LD_LIBRARY_PATH=”$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"’ >> ~/.bashrcecho 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc
CuDNN is a library that helps accelerate deep learning frameworks, such as TensorFlow or Theano. Here’s a brief explanation from the Nvidia website.
The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. cuDNN is part of the NVIDIA Deep Learning SDK.
Before installing you’ll need to register for Nvidia’s Accelerated Computing Developer Program. Once registered, login and download CudNN 4.0 to your local computer. Then move the zip to your deep learning server via scp.
UPDATE: I’ve noticed the latest version of TensorFlow 0.10 only works with CudNN v5.1, possibly because this version of TF is still in development.
sudo scp cudnn-7.0-linux-x64-v4.0-prod.tgz email@example.com:/home/root/
Untar the folder and copy the the necessary files to the existing cuda library we installed earlier.
tar -xzvf cudnn-7.0-linux-x64-v4.0-prod.tgz
cp cuda/lib64/* /usr/local/cuda/lib64/
cp cuda/include/cudnn.h /usr/local/cuda/include/
We’re assuming you’ll be using TensorFlow for building your deep neural network models. We’re currently using TensorFlow 0.10 Simply install via pip with the upgrade flag.
pip install — upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.10.0rc0-cp27-none-linux_x86_64.whl
Now you should have everything you need to run a model using your GPU(s) for computations. You can confirm it works without writing your own validation script by running one of TensorFlow’s example convolutional network scripts.
python -m tensorflow.models.image.mnist.convolutional
The first few lines of output should look similar to the following
The minibatch error should be decreasing with every step. If it’s not, some part of your installation went wrong.
Keras has a few dependencies which they have outline on their website. Running these commands should be everything you need.
sudo apt-get install python-numpy python-scipy -y
sudo apt-get install python-yaml -y
sudo apt-get install libhdf5-serial-dev -y
sudo pip install keras==1.0.8
Like and share if you find this helpful!