Machine Learning for Humans, Part 4: Neural Networks & Deep Learning

Where, why, and how deep neural networks work. Drawing inspiration from the brain. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs). Real-world applications.

This series is available as a full-length e-book! Download here. Free for download, contributions appreciated (paypal.me/ml4h)

With deep learning, we’re still learning a function f to map input X to output Y with minimal loss on the test data, just as we’ve been doing all along. Recall our initial “problem statement” from Part 2.1 on supervised learning:

Y = f(X) + ϵ
Training: machine learns f from labeled training data
Testing: machine predicts Y from unlabeled testing data

The real world is messy, so sometimes f is complicated. In natural language problems large vocabulary sizes mean lots of features. Vision problems involve lots of visual information about pixels. Playing games requires making a decision based on complex scenarios with many possible futures. The learning techniques we’ve covered so far do well when the data we’re working with is not insanely complex, but it’s not clear how they’d generalize to scenarios like these.

Deep learning is really good at learning f, particularly in situations where the data is complex. In fact, artificial neural networks are known as universal function approximators because they’re able to learn any function, no matter how wiggly, with just a single hidden layer.

Let’s look at the problem of image classification. We take an image as an input, and output a class (e.g., dog, cat, car).

Graphically, a deep neural network solving image classification looks something like this:

Image from Jeff Clune’s 1-hour Deep Learning Overview on YouTube

But really, this is a giant mathematical equation with millions of terms and lots of parameters. The input X is, say, a greyscale image represented by a w-by-h matrix of pixel brightnesses. The output Y is a vector of class probabilities. This means we have as an output the probability of each class being the correct label. If this neural net is working well, the highest probability should be for the correct class. And the layers in the middle are just doing a bunch of matrix multiplication by summing activations x weights with non-linear transformations (activation functions) after every hidden layer to enable the network to learn a non-linear function.

Incredibly, you can use gradient descent in the exact same way that we did with linear regression in Part 2.1 to train these parameters in a way that minimizes loss. So with a lot of examples and a lot of gradient descent, the model can learn how to classify images of animals correctly. And that, in a nutshell’s nutshell, is “deep learning”.

Where deep learning does well, and some history

Artificial neural networks have actually been around for a long time. Their application has been historically referred to as cybernetics (1940s-1960s), connectionism (1980s-1990s), and then came into vogue as deep learning circa 2006 when neural networks started getting, well, “deeper” (Goodfellow et al., 2016). But only recently have we really started to scratch the surface of their full potential.

As described by Andrej Karpathy (Director of AI at Tesla, whom we tend to think of as the Shaman of Deep Learning), there are generally “four separate factors that hold back AI:

  1. Compute (the obvious one: Moore’s Law, GPUs, ASICs),
  2. Data (in a nice form, not just out there somewhere on the internet — e.g. ImageNet),
  3. Algorithms (research and ideas, e.g. backprop, CNN, LSTM), and
  4. Infrastructure (software under you — Linux, TCP/IP, Git, ROS, PR2, AWS, AMT, TensorFlow, etc.)” (Karpathy, 2016).

In the past decade or so, the full potential of deep learning is finally being unlocked by advances in (1) and (2), which in turn has led to further breakthroughs in (3) and (4) — and so the cycle continues, with exponentially more humans rallying to the frontlines of deep learning research along the way (just think about what you’re doing right now!)

Illustration by NVIDIA, a leading maker of graphics processing units (GPUs) which were originally built for for gaming but turned out to be well-suited to the type of parallel computing required by deep neural networks

In the rest of this section, we’ll provide some background from biology and statistics to explain what happens inside neural nets, and then talk through some amazing applications of deep learning. Finally, we’ll link to a few resources so you can apply deep learning yourself, even sitting on the couch in your pajamas with a laptop, to quickly achieve greater-than-human-level performance on certain types of problems.

Drawing inspiration from the brain (or is it just statistics?) — what happens inside neural nets

Neurons, feature learning, and layers of abstraction

As you read these words you aren’t examining every letter of every word, or every pixel making up each letter, to derive the meaning of the words. You’re abstracting away from the details and grouping things into higher-level concepts: words, phrases, sentences, paragraphs.

Yuor abiilty to exaimne hgiher-lveel fteaures is waht aollws yuo to unedrtsand waht is hpapening in tihs snetecne wthiout too mcuh troulbe (or myabe yuo sned too mnay dnruk txets).

The same thing happens in vision, not just in humans but in animals’ visual systems generally.

Brains are made up of neurons which “fire” by emitting electrical signals to other neurons after being sufficiently “activated”. These neurons are malleable in terms of how much a signal from other neurons will add to the activation level of the neuron (vaguely speaking, the weights connecting neurons to each other end up being trained to make the neural connections more useful, just like the parameters in a linear regression can be trained to improve the mapping from input to output).

Side-by-side illustrations of biological and artificial neurons, via Stanford’s CS231n. This analogy can’t be taken too literally — biological neurons can do things that artificial neurons can’t, and vice versa — but it’s useful to understand the biological inspiration. See Wikipedia’s description of biological vs. artificial neurons for more detail.

Our biological networks are arranged in a hierarchical manner, so that certain neurons end up detecting not extremely specific features of the world around us, but rather more abstract features, i.e. patterns or groupings of more low-level features. For example, the fusiform face area in the human visual system is specialized for facial recognition.

Top: Illustration of learning increasingly abstract features, via NVIDIA. Bottom: diagram of how an artificial neural network takes raw pixel inputs, develops intermediate “neurons” to detect higher-level features (e.g. presence of a nose), and combines the outputs of these to create a final output. Illustration from Neural Networks and Deep Learning (Nielsen, 2017).

This hierarchical structure exhibited by biological neural networks was discovered in the 1950s when researchers David Hubel and Torsten Wiesel were studying neurons in the visual cortex of cats. They were unable to observe neural activation after exposing the cat to a variety of stimuli: dark spots, light spots, hand-waving, and even pictures of women in magazines. But in their frustration, as they removed a slide from the projector at a diagonal angle, they noticed some neural activity! It turned out that diagonal edges at a very particular angle were causing certain neurons to be activated.

Background via Knowing Neurons

This makes sense evolutionarily since natural environments are generally noisy and random (imagine a grassy plain or a rocky terrain). So when a feline in the wild perceives an “edge”, i.e. a line that contrasts from its background, this might indicate that an object or creature is in the visual field. When a certain combination of edge neurons are activated, those activations will combine to yield a yet more abstract activation, and so on, until the final abstraction is a useful concept, like “bird” or “wolf”.

The idea behind a deep neural network is to mimic a similar structure with layers of artificial neurons.

Why linear models don’t work

To draw from Stanford’s excellent deep learning course, CS231n: Convolutional Neural Networks and Visual Recognition, imagine that we want to train a neural network to classify images with the correct one of the following labels: ["plane", "car", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"].

One approach could be to construct a “template”, or average image, of each class of image using the training examples, and then use a nearest-neighbors algorithm during testing to measure the distance of each unclassified image’s pixel values, in aggregate, to each template. This approach involves no layers of abstraction. It’s a linear model that combines all the different orientations of each type of image into one averaged blur.

For instance, it would take all the cars — regardless of whether they’re facing left, right, center, and regardless of their color — and average them. The template then ends up looking rather vague and blurry.

Example drawn from Stanford’s CS231n: Convolutional Neural Networks and Visual Recognition, Lecture 2.

Notice that the horse template above appears to have two heads. This doesn’t really help us: we want to be able to detect right-facing horse or a left-facing horse separately, and then if either one of those features is detected, then we want to say we’re looking at a horse. This flexibility is provided by deep neural nets, as we will see in the next section.

Deep neural networks approach the image classification problem using layers of abstraction

To repeat what we explained earlier in this section: the input layer will take raw pixel brightnesses of an image. The final layer will be an output vector of class probabilities (i.e. the probability of the image being a “cat”, “car”, “horse”, etc.)

But instead of learning a simple linear model relating input to output, we’ll instead construct intermediate hidden layers of the network will learn increasingly abstract features, which enables us to not lose all the nuance in the complex data.

Source: Analytics Vidhya

Just as we described animal brains detecting abstract features, the artificial neurons in the hidden layers will learn to detect abstract concepts — whichever concepts are ultimately most useful for capturing the most information and minimizing loss in the accuracy of the network’s output (this is an instance of unsupervised learning happening within the network).

This comes at the cost of model interpretability, since as you add in more hidden layers the neurons start representing more and more abstract and ultimately unintelligible features — to the point that you may hear deep learning referred to as “black box optimization”, where you basically are just trying stuff somewhat at random and seeing what comes out, without really understanding what’s happening inside.

Linear regression is interpretable because you decided which features to include in the model. Deep neural networks are harder to interpret because the features are learned and aren’t explained anywhere in English. It’s all in the machine’s imagination.

Some extensions and further concepts worth noting

  • Deep learning software packages. You’ll rarely need to implement all the parts of neural networks from scratch because of existing libraries and tools that make deep learning implementations easier. There are many of these: TensorFlow, Caffe, Torch, Theano, and more.
  • Convolutional neural networks (CNNs). CNNs are designed specifically for taking images as input, and are effective for computer vision tasks. They are also instrumental in deep reinforcement learning. CNNs are specifically inspired by the way animal visual cortices work, and they’re the focus of the deep learning course we’ve been referencing throughout this article, Stanford’s CS231n.
  • Recurrent neural networks (RNNs). RNNs have a sense of built-in memory and are well-suited for language problems. They’re also important in reinforcement learning since they enable the agent to keep track of where things are and what happened historically even when those elements aren’t all visible at once. Christopher Olah wrote an excellent walkthrough of RNNs and LSTMs in the context of language problems.
  • Deep reinforcement learning. This is one of the most exciting areas of deep learning research, at the heart of recent achievements like OpenAI defeating professional Dota 2 players and DeepMind’s AlphaGo surpassing humans in the game of Go. We’ll dive deeper in Part 5, but essentially the goal is to apply all of the techniques in this post to the problem of teaching an agent to maximize reward. This can be applied in any context that can be gamified — from actual games like Counter Strike or Pacman, to self-driving cars, to trading stocks, to (ultimately) real life and the real world.

Deep learning applications

Deep learning is reshaping the world in virtually every domain. Here are a few examples of the incredible things that deep learning can do…

  • Facebook trained a neural network augmented by short-term memory to intelligently answer questions about the plot of Lord of the Rings.
Research from FAIR (Facebook AI Research) applying deep neural networks augmented by separate short-term memory to intelligently answer questions about the LOTR storyline. This is the definition of epic.
  • Self-driving cars rely on deep learning for visual tasks like understanding road signs, detecting lanes, and recognizing obstacles.
Source: Business Insider
  • Deep learning can be used for fun stuff like art generation. A tool called neural style can impressively mimic an artist’s style and use it to remix another image.
The style of Van Gogh’s Starry Night applied to a picture of Stanford’s campus, via Justin Johnson’s neural style implementation: https://github.com/jcjohnson/neural-style

Other noteworthy examples include:

…and many, many, more.

Now go do it!

We haven’t gone into as much detail here on how neural networks are set up in practice because it’s much easier to understand the details by implementing them yourself. Here are some amazing hands-on resources for getting started.

  • Play around with the architecture of neural networks to see how different configurations affect network performance with the Google’s Neural Network Playground.
  • Get up-and-running quickly with this tutorial by Google: TensorFlow and deep learning, without a PhD. Classify handwritten digits at >99% accuracy, get familiar with TensorFlow, and learn deep learning concepts within 3 hours.
  • Then, work through at least the first few lectures of Stanford’s CS231n and the first assignment of building a two-layer neural network from scratch to really solidify the concepts covered in this article.

Further resources

Deep learning is an expansive subject area. Accordingly, we’ve also compiled some of the best resources we’ve encountered on the topic, in case you’d like to go… deeper.

Next up: time to play some games!

Last, but most certainly not least, is Part 5: Reinforcement Learning.


Enter your email below if you’d like to stay up-to-date with future content 💌 💌

On Twitter? So are we. Feel free to keep in touch — Vishal and Samer 🙌🏽