Troubleshooting my deep learning Ubuntu setup
So I decided to do some deep learning and wanted to install Nvidia dirvers, Cuda, CuDNN, etc. I had a lot of trouble to make it work (been at it for the past 5 to 6 hours), hence I decided to write this as a note to myself for the future (if required) and probably this could be useful to others (note that I found people running into different problems based on their hardware).
My system details: Lenovo Y510p, Nvidia 755M GPU, Intel 4th gen i7.
In my first attempt I had installed Nvidia-375 dirvers and Cuda 8 after disabling secure boot, my operating system went bonkers, had the infinite login screen loop issue. This apparently is a very common issue. To resolve this switch to a TTY from the login screen (in Ubuntu: Ctrl + Alt + F1) and do the following
sudo apt-get purge nvidia*
sudo apt-get autoclean
sudo apt-get autoremove nvidia*
sudo apt-get purge cuda*
mv ~/.Xauthority ~/.Xauthority.bakNow the best way in my experience to install proprietary Nvidia drivers:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo ubuntu-drivers autoinstallThis will install the latest Nvidia drivers along with a few others.
Now to install Cuda 8: go to https://developer.nvidia.com/cuda-downloads and download Cuda for your architecture and OS (x86_64 and Ubuntu 16.04 in my case). The instructions for installation are given in the link. Now update path:
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.zshrc
source ~/.zshrcNow run:
sudo modprobe nvidia
nvidia-sminvidia-smi is a command-line monitoring tool, you should see your GPU details, memory usage etc. If this works, you are almost there.
Now its time to install CuDNN. I ran into issues while I was training a model when I installed CuDNN version 6 for Cuda 8, took a long time to realise this was the problem. I then installed CuDNN version 5.1 for Cuda 8 which worked well.
Go to https://developer.nvidia.com/cudnn, sign up to become a NVDIA developer member and download CuDNN 5.1 for Cuda 8. After downloading and extracting the file run
sudo cp include/cudnn.h /usr/local/cuda/include/
sudo cp */libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*Now install theano, tensorflow, keras etc. Configure them to use the GPU.
The above steps worked for me. A simple ConvNet that took around 2 to 3 hours to run on my CPU, now runs in about 10 minutes!
References:
