Demystifying Deep Neural Nets

History

Despite their current buzz, neural networks are not a new concept. The idea of a biologically-inspired ‘perceptron’ was first explored in the 1940s, when the field was known as ‘cybernetics’. Unfortunately, some perceived limitations (such as the single-layer perceptron’s inability to model the XOR function) meant that it eventually fell out of favour.

Neural nets vs conventional computing

Let’s try to understand how neural networks differ from conventional computing using a trivial example.

IF (furry) AND
IF (has tail) AND
IF (has 4 legs) AND
IF (has pointy ears) AND
Etc…
A collage of cat photos
A collage of pictures of objects that aren't cats
On the left, a straight line attempting to separate green and red dots, on the right a more complicated line that successfully separates them
Image source: kevinbinz.com
y = mx + c

Inside a neuron

Now let’s look at what happens inside a neuron. Again, we’ll use a trivial example to try and develop an intuition.

  • Will the weather be nice?
  • What’s the music like?
  • Do I have anyone to go with?
  • Can I afford it?
  • Do I need to write my thesis?
  • Will I like the food?
A diagram illustrating multiplying inputs by their importance, summing them and checking if the total is above a threshold
A diagram giving numerical values to the weights, inputs, and threshold value
The diagram illustrating numerical values, highlighting the fact that the threshold value can be moved to the other side of the equation
The diagram illustrating numerical values, with the threshold value represented as a negative number on the other side of the equation
The internal structure of a neuron. Inputs are each multiplied by a weight, summed with a bias, and passed through an activation function
The diagram of the internal structure of the neuron, with a circle highlighting that the weights, bias and activation function are all inside a single neuron

Constructing a network

So how do we turn this from a single neuron into a neural network?

A 3-layer network, with circles representing each neuron, connected by arrows
A 4-layer network, with circles representing neurons, connected by arrows

Activation functions

In our trivial festival example, we only had a binary output; yes or no — which we could achieve with a simple threshold test. But what if you want an output that isn’t binary? For this, we can use an activation function.

Graphical representations of the step, sigmoid and ReLU functions
The diagram of the single neuron, with an arrow showing where the activation functions occur

Training the network — the short version

Let’s look at an overview of the steps needed to train the network:

  1. Randomly initialise the network weights and biases
  2. Get a ton of labelled training data (e.g. pictures of cats labelled ‘cats’ and pictures of other stuff also correctly labelled)
  3. For every piece of training data, feed it into the network
  4. Check whether the network gets it right (given a picture labelled ‘cat’, is the result of the network also ‘cat’ or is it ‘dog’, or something else?)
  5. If not, how wrong was it? Or, how right was it? (What probability or ‘confidence’ did it assign to its guess?)
  6. Nudge the weights a little to increase the probability of the network more confidently getting the answer right
  7. Repeat
A diagram showing an image of a cat being fed into a 4-layer network
The same diagram with an arrow from right to left indicating the direction of updating the weights

Training the network — the long version

Now let’s take a deeper look at how we do this.

A graph illustrating a line of best fit through 4 data points
A graph displaying a 3D landscape with peaks and troughs. The lowest point is highlighted
Image source: firsttimeprogrammer.blogspot.co.uk
The same landscape but only a tiny part of it is visible

Gradient Descent

To find the lowest point, we use a technique called Gradient Descent. Imagine you are standing at the top of a mountain but have a blindfold on. You need to make it down to the bottom but you can’t see which way to go. What do you do? You feel around with your foot and find the direction that has the steepest slope, and then take a small step in that direction. You don’t want to take too big a step — that could be dangerous — but you also don’t want to take too small a step — it will take forever to get down.

An animated gif showing arrows descending the landscape towards the lowest point, while the network prediction improve

Backpropagation

When reading about neural networks, you’ll often come across the term ‘backpropagation’. This refers to the algorithm used to perform gradient descent across the multiple layers of the network. The name comes from the fact that we start the process at the output layer, and work towards the input layer, propagating the changes backwards throughout the network. We calculate the gradient of the slope at each layer mathematically by taking the partial derivative of the loss with respect to the weights (if that makes no sense to you right now, don’t worry). The amount we then nudge the weights in that direction is determined by the learning rate — this is the size of our ‘step’ down the mountain.

Don’t panic!

That was quite a lot of technical detail given that I claimed this post was about creating intuition. But it’s ok, because there are loads of machine learning libraries that take care of the tricky maths. As long as you have a rough understanding of the overall process, you should know enough to read the documentation and start using these libraries. There are loads of options but one of the most popular at the moment is TensorFlow, Google’s open source machine learning library in Python. There are loads of great tutorials to help you get started.

Convolutional Neural Networks

Now let’s look at a special type of neural network called a ‘Convolutional Neural Network’, or ConvNet. Earlier, we visualised a cat image being fed into a neural network, and I said just assume each pixel corresponds to one input. It turns out, there’s a more effective way to handle image data rather than assuming each pixel is independent.

Images are arrays of numbers

Here, I have pixellated an image of a hand-drawn number to illustrate this. In a monochrome image, each pixel has a value. In the example here, white pixels have a value of 1, black pixels have a value of 0, and grey is somewhere in between. The same principle is true of colour images, but instead each pixel is defined by three numbers corresponding to the red, green and blue channels (sometimes there is a fourth number representing the ‘alpha’ channel which controls transparency).

A hand drawn three next to a pixellated version of itself, with numbers highlighting the values of pixels in a small area

Convolution

Because images are just arrays of numbers, we can do maths on them. A particularly useful mathematical operation we can do is called ‘convolution’. This involves passing another array of numbers over every pixel of the image, multiplying the overlapping numbers, adding them up, and creating a new array containing the results.

A 3D array. First row: 1, 0, 1. Second row: 0, 1, 0. Third row: 1, 0, 1
An animated gif of a kernel filter passing over a larger array of numbers, and storing the convolved results
Image source: ufldl.stanford.edu
Examples of filter effects (blur, sharpen and edge detect) and their corresponding kernel values.

A convolutional neuron

Remember our neuron from earlier? It looked something like this:

The diagram of a single neuron, with the different parts labelled
A diagram of a convolutional neuron, with the same parts labelled as in the diagram of a standard neuron

What’s so great about ConvNets?

ConvNets have been making a splash in many image processing tasks. One example of this is the ‘ImageNet Challenge’, which asks teams to create algorithms to recognise objects given a labelled training data set.

Images of a mite, a container ship, a motor scooter and a leopard, each with an algorithm's top 5 guesses at what it is. Each one has been correctly identified.
Image source: kaggle.com
A bar chart illustrating the percentage error of the ImageNet challenge each year. The bars are descending in height as time progresses

Inside a ConvNet

Remember we talked about how images are just arrays of numbers? And how kernel filters are also just arrays of numbers? This means we can actually visualise kernel filters as images. We can therefore see what weights have been learnt at different layers in the network, which helps us understand exactly how the ConvNet is learning to recognise objects.

Three sets of squares. The first contains squares of lines and patterns, the second contains squares of shapes, and the third contains squares of object parts
Image source: xlgps.com

Fun applications

A couple of years ago, Gatys et al noticed a very interesting consequence of ConvNets trained for object recognition. They realised that to identify an object, the network had to learn to abstract away the style of the image. The network should be able to recognise a cat whether it is a photo or a drawing, for example.

A photo of some houses is shown with three different painterly styles applied
A selfie of me, with 9 different artistic styles applied

Fooling ConvNets

Another fun (or worrying…) side effect of ConvNets is that they can be easily fooled.

Three images: a truck labelled 'truck', some random noise, and the truck again labelled 'ostrich'
Image source: karpathy.github.io.
Random noise, labelled '100.0% Goldfish'
Image source: karpathy.github.io.

Deep Dream

Google brought ConvNets into the limelight with their ‘Deep Dream’ experiments.

Before and after deep dream: A landscape that has been transformed with dreamy shapes
Image source: fromthegrapevine.com
Another deep dream example that looks like a purple abstract painting
Image source: telegraph.co.uk
Another deep dream example, that looks like a gold painting full of dogs
Image source: telegraph.co.uk

Why are there so many dogs?!

The reason is that the ConvNet was trained on a dataset of images which included a lot of cars, buildings and dogs. This means it was predisposed to seeing these objects — kind of like human pareidolia: the bias that causes us to see faces in everything from clouds to toast.

  1. First of all, take the time to learn about unconscious bias. We all have it, no matter how rational we think we are. If we are more aware of our own cognitive and unconscious biases, we can be more consciously careful not to let them negatively effect our machines.
  2. Secondly, use output testing to check whether the network has learnt what you want it to learn, and not something that just happens to correlate. You can do this by testing it on carefully designed uncommon edge cases, for example.
  3. Next, take steps to diversify your training data, ensuring it is genuinely representative and that inconsequential features are distributed evenly across the set.
  4. Finally, diversify your engineering teams. The more different people from different backgrounds with different perspectives we have building these systems, the better our chance of spotting bias errors and building systems that are broadly applicable.

Phew! We did it.

We’ve seen what’s in a neuron, how they make a network, what a ConvNet is and some of their interesting applications. But there is a ton we haven’t covered. When reading about neural networks, you might encounter words like: unsupervised learning, pooling, strides, stochastic, batch, dropout, regularisation, transfer learning, recurrent neural networks, generative adversarial networks, and loads more. There is no way we can cover all this in one post! However, you should have a good foundation to go on and read about these things even more.

Screenshot of a tweet by @karpathy that reads "Everything I know about design of ConvNets (resnets, bigger=better, batchnorms etc) is useless in RL. Superbasic 4-layer ConvNets work best."
Diagram illustrating a single ConvNet neuron being inserted into a 4-layer network

--

--

Productivity, systems, and optimization. Effective altruism, science, futurism, tech, econ, and rationality.

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Rosie Campbell

Rosie Campbell

Productivity, systems, and optimization. Effective altruism, science, futurism, tech, econ, and rationality.