Getting started with GPU Computing for machine learning

Hilarie Sit
6 min readJan 21, 2019

--

A quick guide for setting up Google Cloud virtual machine instance or Windows OS computer to use NVIDIA GPU with Pytorch and Tensorflow

Training machine learning models with thousands or more training examples on a CPU (central processing unit) can take days if not weeks, all the while, draining away at your patience! At this rate, you might want to seek out a GPU (graphics processing unit), which is a processor containing hundreds or thousands of processing cores that are optimized to perform parallel operations. GPUs are commonly used for rendering graphics in gaming, but their power can be harnessed for general computing in modeling and deep learning tasks! Unfortunately, GPUs are not cheap, but there are several options to choose from.

Google Cloud Platform

One of the easiest ways to access a GPU is through a cloud platform. Google Cloud Platform (GCP) offers a variety of services, including its compute engine and cloud storage, for the general public. Most services are billed per second, but GCP offers a free trial ($300 credits for new users), and students can obtain additional free credits through courses or organizations, so this is an easy option to get started with GPU computing.

Virtual Machine Instance

A virtual machine (VM) allows you to use hardware from Google’s data centers located around the world on your own computer. You will need to properly set up your VM instance. Begin by navigating to the Console of your project. In the navigation bar, under the Compute section, select Compute Engine then VM instances.

When prompted to create your first VM instance, select Create and you will be directed to the customization page shown below.

  1. Name, Region, Zone. Give your new VM a name, and select a region and zone. Different zones have different features, such as the availability, number, and type of vCPU platforms and GPUs. Five options of NVIDIA Tesla GPUs are available. Consider GPU compute capability, memory, and pricing when making your decision, and select the appropriate region and zone.
  2. Machine Type. CPU cores, memory and GPUs are all customizable. Click Customize and adjust number of vCPU cores and memory as necessary. To add a GPU, change Number of GPUs to 1 and pick the desired GPU type from the ones available in your selected region and zone. It is possible to change some customizations later.
  3. Boot Disk. We will change the boot disk to Ubuntu 16.04. Click Change and under OS images, scroll down to select Ubuntu 16.04 LTS. Add disk space as needed.
  4. Firewall. Check Allow HTTP traffic and Allow HTTP traffic before hitting the Create button.

Navigate to your VM instance and click SSH to open the terminal window. Check if python is installed, otherwise install python, and then install pip:

sudo apt update
# check python
python --version
# install python
sudo apt install python python-dev python3 python3-dev
# install pip
wget https://bootstrap.pypa.io/get-pip.py
sudo python get-pip.py
# check pip
pip --version

Through the VM terminal, you can install other libraries using pip and run python code! To add files into the VM, either upload them manually (from the settings menu on the top right corner), clone them from a remote repo or use gcloud command-line. Similarly, either download the files from the settings menu, push them onto your remote directory or use gcloud command-line to extract files from the VM.

Upload or download files from settings

CUDA

CUDA is a platform developed by NVIDIA that allows you to use their GPUs for general computing. Installing CUDA is necessary to run popular ML frameworks, such as Pytorch and Tensorflow, on NVIDIA’s GPUs. Copy and paste this snippet to install NVIDIA drivers and CUDA 9.0 in Ubuntu 16.04 (installation details can be found here):

#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! dpkg-query -W cuda-9-0; then
# The 16.04 installer works with 16.10.
curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
sudo dpkg -i ./cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda-9-0 -y
fi
# Enable persistence mode
sudo nvidia-smi -pm 1

Set the PATH environment variable:

export CUDA_HOME=/usr/local/cuda-9.0
export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH

Verify that you have installed the NVIDIA driver and CUDA properly. If you are only interested in using Pytorch, the last step is to install torch and torchvision:

# Verify driver installation
nvidia-smi
# Verify cuda toolkit installation
nvcc --version
# Install torch and torchvision
sudo pip install torch torchvision

cuDNN

To install tensorflow-gpu, you will need to install cuDNN, which is NVIDIA’s GPU-accelerated deep neural network library. Sign up for a free NVIDIA developer account. Under archived cuDNN releases, download cuDNN 7.0.5 Library for Linux for CUDA 9.0.

Upload the downloaded .tgz file into your VM and then extract the files. Transfer the files into the appropriate directories and add permissions (details can be found here):

# extract .tgz file 
tar -xvzf cudnn-9.0-linux-x64-v7.tgz
# move files and add permissions
sudo cp cuda/include/cudnn.h /usr/local/cuda-9.0/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda-9.0/lib64
sudo chmod a+r /usr/local/cuda-9.0/include/cudnn.h /usr/local/cuda-9.0/lib64/libcudnn*

Remove any extraneous files. Finally, install tensorflow-gpu:

sudo pip install --upgrade tensorflow-gpu

Be sure to stop your VM instance after you are done using it or you will be billed! Nothing is scarier than discovering that you’ve accidentally left your VM instance running all night.

Meme template from here

Windows OS

If you train models often on a GPU or have additional interest in gaming, it may be worth the investment to buy external GPUs for your computer (be sure to do your research: check motherboard compatibility and make sure that the GPU is CUDA-enabled) or build your own deep learning powerhouse (check out this site for picking out PC parts). The procedure for setting up a computer with a NVIDIA GPU is similar: install drivers → install CUDA → install cuDNN. With Windows operating systems, there are some additional steps. I will detail the procedure for installing Visual Studios 2017, CUDA 9.0, cuDNN 7.0.5 on a Windows 10 operating system.

Begin by downloading the free community version of Visual Studio 2017 from Microsoft. In Visual Studio Installer, under Workloads, select .NET desktop development, Desktop development with C++, and Universal Windows Platform development. Under individual components, select VC++ 2015.3 v14.00 (v14.00) toolset for desktop and then install.

CUDA 9.0 for Windows OS needs to be manually downloaded from NVIDIA’s site. Select the exe (local) installer for the appropriate operating system and download the base installer and patches in order.

Extract the base installer and accept the software license agreement. Use the recommended Express installation. If the base installer fails due to Visual Studio integration, select Custom installation, and under CUDA, unselect Visual Studio Integration (not necessary).

If necessary, unselect Visual Studio Integration

If you would like to use tensorflow-gpu, after installing the base installer and patches, download cuDNN 7.0.5 Library for Windows 10 for CUDA 9.0. Extract the compressed file and navigate into the cuda directory. Move the files inside bin, include and lib\x64 into the correct locations: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\include, and C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64 respectively. Finally, use pip to install tensorflow-gpu!

You are now ready to train your machine learning models with a GPU! I hope you found this article helpful; any feedback is appreciated! :)

--

--