Troubleshooting my deep learning Ubuntu setup

So I decided to do some deep learning and wanted to install Nvidia dirvers, Cuda, CuDNN, etc. I had a lot of trouble to make it work (been at it for the past 5 to 6 hours), hence I decided to write this as a note to myself for the future (if required) and probably this could be useful to others (note that I found people running into different problems based on their hardware).

My system details: Lenovo Y510p, Nvidia 755M GPU, Intel 4th gen i7.

In my first attempt I had installed Nvidia-375 dirvers and Cuda 8 after disabling secure boot, my operating system went bonkers, had the infinite login screen loop issue. This apparently is a very common issue. To resolve this switch to a TTY from the login screen (in Ubuntu: Ctrl + Alt + F1) and do the following

sudo apt-get purge nvidia*
sudo apt-get autoclean
sudo apt-get autoremove nvidia*
sudo apt-get purge cuda*
mv ~/.Xauthority ~/.Xauthority.bak

Now the best way in my experience to install proprietary Nvidia drivers:

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo ubuntu-drivers autoinstall

This will install the latest Nvidia drivers along with a few others.

Now to install Cuda 8: go to https://developer.nvidia.com/cuda-downloads and download Cuda for your architecture and OS (x86_64 and Ubuntu 16.04 in my case). The instructions for installation are given in the link. Now update path:

echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.zshrc
source ~/.zshrc

Now run:

sudo modprobe nvidia
nvidia-smi

nvidia-smi is a command-line monitoring tool, you should see your GPU details, memory usage etc. If this works, you are almost there.

Now its time to install CuDNN. I ran into issues while I was training a model when I installed CuDNN version 6 for Cuda 8, took a long time to realise this was the problem. I then installed CuDNN version 5.1 for Cuda 8 which worked well.

Go to https://developer.nvidia.com/cudnn, sign up to become a NVDIA developer member and download CuDNN 5.1 for Cuda 8. After downloading and extracting the file run

sudo cp include/cudnn.h /usr/local/cuda/include/
sudo cp */libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

Now install theano, tensorflow, keras etc. Configure them to use the GPU.

The above steps worked for me. A simple ConvNet that took around 2 to 3 hours to run on my CPU, now runs in about 10 minutes!

References:

  1. https://github.com/fastai/courses/blob/master/setup/install-gpu.sh
  2. https://medium.com/@vivek.yadav/deep-learning-setup-for-ubuntu-16-04-tensorflow-1-2-keras-opencv3-python3-cuda8-and-cudnn5-1-324438dd46f0

Graduate CS student at University of Toronto. Past: Software Engineer at VoterCircle & Morgan Stanley. Webpage: https://www.cs.toronto.edu/~satyag/

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade