One Simple Trick to Fix Your Handwriting Problems
A 30 Day Writing Challenge
It’s Day 11 of my 30 Day Writing Challenge. I’ve learnt something new every day and written a post about it for the last 10 days. I’m about to start a new job, building a machine learning team. I want to be prepared.
Yesterday, we explored the fundamental ideas of a neural network by considering how we can model a single neuron. We can give the neuron some input and, if the input is strong enough, we get an output. We also figured out how to adjust the weights and the activation threshold each time we trained it.
Our example was really simple — too trivial to be useful in real life. Today, let’s build the network to make it more useful. There’s a dataset called MNIST consisting of 28 x 28 pixel images of handwritten digits. Let’s see if we can get a neural network to recognise them.
We want the inputs of our neural network to be each of the pixels (784 in total) — 1 if the pixel is black, 0 if it’s white. We also want to simplify our mathematical model. Instead of having an activation threshold, which results in an output of zero or one, we’ll use a Sigmoid function which gives any value between zero and one. Plotted on a graph it looks like this:
We then train the neuron by gradually adjusting the weights so that we get closer and closer to the expected output. Imagine standing on a landscape where you want to get from the top of a hill (where there’s a high error rate) to the flat ground (where the error rate is low). You want to walk down the hill with each step.
It’s really that straightforward.
Note: You can skip this section if you don’t feel comfortable diving into the equations. You should still be able to follow using the anecdotal explanations.
There are various Sigmoid functions we could use here. We’ll use the Logistic function as the mathematics is relatively easy.
As we’ve got rid of the activation threshold, we now only have to adjust the weights for each input. Each weight is adjusted using the partial differential of the error with respect to the original weight.
There are various ways of defining the error function
E but it’s essentially the difference between the expected output and the actual output.
Ok, so we’ve got our new neurons which are more flexible. We now need to arrange them in a network with 784 inputs, one for each of the pixels, and 10 outputs, one for each of the digits the image could represent. The outputs will be the probability that the image is of that digit.
We train this network exactly as we trained the simple example in the last post — try lots and lots of inputs and adjust the weights based on the expected output. Once trained, we can pass any image into the network and we’ll know with a fairly high degree of certainty which digit the image shows.
It turns out that this simple network isn’t particularly accurate, correctly recognising images about 92% of the time. There’s not enough neurons to generalise the shapes that appear in each image. If we add a hidden layer of neurons between the input and the output layer, we allow the network to generalise features. For example, the digits 7 and 4 have sharp corners but 8 and 9 have loops. Each hidden layer gives space to encode these features, providing more accuracy.
With just a few modifications to the neuron we tinkered with yesterday, we’re able to build a neural network which is powerful enough to recognise handwritten digits. It’s not perfect — our images have to be the same size and only contain a single digit — but it shows the power of neural networks.
Tomorrow, I’m going to look into how we generalise the problem so we can recognise individual letters written in words and sentences.
This is a post in my 30 Day Writing Challenge. I’m a software engineer, trying to understand machine learning. I haven’t got a PhD, so I’ll be explaining things with simple language and lots of examples.
Follow me on Twitter to see my latest posts. If you liked this article please click the heart button below to share — it will help other people see it.