Getting started with the deep learning stack

This post is an excerpt from an upcoming course. If you like it, check out the crowdfunding project.

Deep learning generally deals with high-dimensional data — meaning vectors of more than 100 dimensions. A very simple 3-level convnet applied to a 128x128 image would contain around 200,000 neurons. That’s ~400k floats already!

Fortunately, most of the operations involved to train a neural network are fairly simple. Updating a neuron is a matter of addition and multiplication. Since deep learning involves a huge number of small computations, it makes a lot of sense to do stuff in parallel.

Enter the GPU. Originally intended for the high-octane graphics that power PC video games, the GPU is perfectly suited to this type of large-scale parallel task.

How we crunch all those numbers

The typical deep learning framework is written in a low-level language like C++, is tied together with a scripting language like Python, and makes use of a parallel computing platform like CUDA.

Many of the famous deep learning papers use massive supercomputer clusters to process all those neurons. DeepMind’s AlphaGo used up to 1,920 CPUs and 280 GPUs to beat a human player at Go. It’s quite common to read a paper with great results only to realize they got there with a stack of 20 GPUs.

Frameworks

The most popular framework is Google’s Tensorflow. This is a fairly low-level framework that can be used with Python and C++.

Other popular frameworks are:

  • Theano — another low-level framework, numerical computation focused, lacks GPU support.
  • Caffe — a computer vision focused framework. Not very flexible or well documented, but has a significant performance advantage.
  • Torch — uses the programming language Lua instead of Python. Supports GPU computing.

Assignment: Set up Tensorflow

Minimal: This class will use Tensorflow and Python for all the examples. You can install Tensorflow yourself using the instructions here.

Optional: The GPU-enabled version of Tensorflow requires CUDA and cuDNN, and of course, a good-enough NVIDIA GPU. You need to request a password from NVIDIA to be able to install cuDNN, which takes a couple of business days.

The expensive shortcut (recommended if possible): Rent a really good deep learning computer (known as an ‘instance’) by the hour on Amazon Web Services. Set up an instance using this Amazon Machine Image, that comes with all the possible deep learning software pre-installed. You can rent the 8x GPU machine g2.8xlarge for $2.60 an hour, or the slightly cheaper 2x GPU g2.2xlarge. You can learn more about getting started with an Amazon instance here.