Training to train: Artificial Neural Networks, part II

Andy Elmsley
The Sound of AI
Published in
5 min readApr 3, 2019

Welcome back, networks-in-training! Last week we introduced the Multilayer Perceptron (MLP) — our first Artificial Neural Network (ANN). We built a fairly strict implementation (it had one fixed hidden layer), and I challenged you to make it more flexible.

Training an ANN with gradient descent is like trying to find the quickest path down a mountain.

While there are lots of ways this could have been done, here’s my solution:

This new solution makes it easier to experiment with our network’s architecture (the number of inputs, outputs, hidden layers and their sizes) to see what works best for the problem we’re trying to solve. We can also create a oh-so-popular deep network, by inputting large numbers like:

This code would create an MLP instance with one input, five hidden layers — each with 1024 neurons — and one output.

Apropos propagation

Part of the elegance of ANNs lies in the fact that each layer of the network is “activated” sequentially. This can be seen in the ‘activate’ method above — data is flowing through the network in a simple for-loop, and the output from the previous layer is used as the input to the next. This is known as forward propagation. Here’s where things get really interesting — the network can learn from it’s mistakes by running in reverse, which is known as backwards propagation or backpropagation.

This isn’t a matter of just running the network in reverse, though. While forward propagation propagates the input value(s) from left to right, backpropagation propagates the error value(s) from right to left. For backpropagation to work, we need to calculate the error (a.k.a. cost or loss) by comparing the actual output with the desired output. This calculated error is what is back propagated to the previous layer(s), from right to left. The connection weights of each node are then adjusted by changing the weights to reduce the prediction error.

An ANN showing the propagation of input and error.

Training ANNs with supervised learning

For an ANN to learn with backpropagation, we need a dataset of training examples. Each example needs to be made up of the input data and the desired output for that input.

Training an ANN is an iterative process in which training data examples are presented to the network one by one, and the values of the weights are adjusted each time. After all examples get run through the network, one training epoch is finished and the process often starts again. During this phase, the ANN learns to predict the correct output, given the input examples, and, hopefully, is able to generalise the prediction to other data similar to the input examples. This is known as supervised learning (SL).

During the training phase, we pass a training example to the network and then compare the network’s calculated values of the output node(s) against the ‘correct’ values. Then, we calculate an error value. This error can then be backpropagated, and the connecting weights in the hidden layers can be adjusted, with the hope that the output values calculated next time will be closer to the correct values.

The mechanics of backpropagation

Enough theory — let’s get back to the code!

The first thing we need is an error function, since we need to keep track of the network’s performance. There are a lot of error functions to choose from, but for now we’ll stick to the commonly used mean squared error. With numpy, the implementation is pretty straight forward:

When we say that the network is learning, what we really mean is that we’re minimising this error in the network with a gradient descent algorithm. Understanding gradient descent is a little complex to explain mathematically — but essentially it’s a way we can iteratively tweak the weights in the network to reduce the error. If you’re interested in learning more about the maths behind how this works, check out this awesome explanation from 3Blue1Brown.

The magic piece of the puzzle that makes gradient descent work is the derivative of our activation function (sometimes called the prime). Luckily for us, the sigmoid function has a simple derivative:

y = x * (1.0 — x)

Using this deceptively simple code, we can take a step down the error gradient, adjust the weights in the network and improve the performance. We do this by stepping in reverse through each layer in the network, calculating the derivative of the output with respect to the input, and then backpropagating the error to the next layer. Here’s some code that does all that:

Once we have the derivatives for each layer, we can take one final pass through the network to update the weights with gradient descent. There are many versions of gradient descent you could use, but in this lesson we’ll stick to simply integrating the derivative with a learning rate, and then adjusting the weights accordingly.

Putting it all together

We now have all the components in place for backpropagation. All we need to finish our implementation is a controlling algorithm that performs a number of training epochs on some data.

Let’s start with the data by creating a toy dataset of numbers:

We’ll try to get our network to learn this simple mapping with a training routine.

When you run this code, you should notice the error decreasing with each epoch.

Epoch complete

We’ll leave it there for this week. We’ve covered backpropagation and gradient descent for training our MLPs with supervised learning and a labelled dataset. You might have noticed that the network isn’t particularly good at approximating this toy dataset we made. Next week we’ll dig into why this might be, when use our newly-acquired skills on some more realistic data.

In the meantime, I have a simple investigative challenge for you:

  1. Play around with different values of the network size, amount of training data and learning rate. What do you notice about how each number affects the performance of the model

As always, you can find the source code for all the above examples on our GitHub.

To begin your AI-coding training at day one, go here.

And give us a follow to receive updates on our latest posts.

--

--

Andy Elmsley
The Sound of AI

Founder & CTO @melodrivemusic. AI video game music platform. Tech leader, programmer, musician, generative artist and speaker.