Let’s code a Neural Network from scratch — Part 1
There’s something particularly interesting about a computer learning to do simple task, with it comes a given probability that it might one day surpass human capabilities.
This tutorial aims to teach creative coders how to create an Artificial Neural Network (ANN). To be a little more specific our network will consist of neurons and perceptrons which will feed-forward and use backpropagation to learn. This may sound alien to you but fear not, I will explain everything in detail. It is also ANN in its simplest form, therefore, please don’t expect a Tensorflow challenger, its beauty lies in its simplicity. It does require some familiarity with Object-oriented programming (OOP).
We will use the popular MNIST data set of handwritten digits to train the network and ultimately recognise each of them. You may use your preferred language however I recommend Processing for its ease of use and graphical capabilities which will help us understand what’s going on.
As a side note, if I happen to have any AI experienced readers, I’d be very interested to hear if/how the performance can be improved without adding too much complexity or libraries.
How does our brain function?
Let’s first explore where ANN originates — our brains. At the sight of a speed limit sign “70” the photons bouncing from your headlights, to the panel and back into your retina produces a chemical reaction. Some of the neurons connected to your visual cortex would get excited and pass the signal to other neurons through a networks of dendrites which releases a chemical signal.
Eventually, activity wells up in your Occipital and Temporal Lobes (areas for interpreting visual information) and causes you to realise you’re going over the speed limit. The photons are therefore processed through a number of layers leading to a correct action, in this instance reducing your speed.
With our elementary understanding of the brain function let’s proceed in exploring the structure of the NN we’re about to build. We can abstract the aforementioned concept by constructing a set a “Inputs” neurons representing the retina, this will take the shape of a grid of pixels. The information is then passed to a “hidden” layer which represents the layers within our brain. Finally, this hidden layer connects to an output layer, the equivalent to the motor neurons which caused you to slow down.
Notice how every neuron within each layer is not linked to one another unlike in our brain. Rather, they are linked to all the neurons in the following layer. This explains the feed-forward nature of our network, data is passed through the layers in a forward manner.
Regarding the size of the matrices in each layer they have been kept as small as possible so the network can be run on any laptop. Each digit (input) is represented as a grid of 14x14 pixels (see below) which was downsized from 28x28 by Alasdair Turner. The hidden layer consist of a grid of 7x7, this is a completely arbitrary choice and I encourage you to play around with this once you have everything running. Lastly, our output is a series of 10 neurons each one indicating a possible digit from 0 to 9.
Instead of having neurons in each layer we will also have Perceptrons. The purpose of these is to take the sum of the inputs and if it is above a given threshold then it will “fire”. Notice how each input is weighted, the ones with a higher number (bold arrow) signify a stronger link between the neurons. In this case 90% (0.9*100) of the signal will pass through.
Let’s write some code
With the basics out of the way let’s structure our program, you can replicate this in Processing by opening a tab for each of the following segment.
- Main tab: This will allow us to draw, ‘train’ and ‘test’ our neural network.
- Load data: This tab will allow us to load a set of labelled ‘flash cards’. Each flash card will have a handwritten number on it for the neural network to recognise, and a label on the back to tell us which number it actually is.
- Neuron: This will be the code for an individual perceptron as draw above.
- Network: This will consist of a number of arrays of neurons. One for the input layer, one for the hidden layer, and one for the output layer.
- Sigmoid: The purpose of this tab is to construct a ‘how to fire a neuron’ function. More detail on this in part 2.
Loading the data
The first thing we need to do is to load the data and display it as a grid. Let’s now refer to each dot as a neuron, notice how they’re in grayscale. We will say the neuron will not just be on or off but instead within a range of +1 (black, on), 0 (grey, partially on/off) and -1 (white, off). Unlike our brain they will stay in whatever state they were last set to.
The full data set contains 10,000 written digits, we will split this in two so we have a training set of 8,000 digits and a testing set of 2,000 digits. When training the network we will use the corresponding numeric label to “teach” it. When testing, the cards will be used as an “exam” question for your computer to see whether it can guess correctly. It will not be given the correct solution.
I don’t think it’s worth covering how we import the data which is presented as a set of raw bytes values, it’s secondary to the function of the algorithm. You can also find the full reference on the NMIST database regarding how to process it. It’s worth pointing out that each digit is represented as an object of the class “Card”. They are all loaded using the function “loadData()” which is called before anything in “setup()” and resides outside of the “Card” class
The input being a 14x14 grid gives us 196 inputs which are stored in a simple array. The output is also stored as an array of 10 possible answer, as +1 if it corresponds to the output of -1 if it doesn’t. We will also store the label as a simple a numeric backup to the outputs using the “label” variable.
Input & Neuron Class
Now we’ve covered how to load our data, let’s refer to each circle drawn on the screen as a neuron.
We’ll start with a very simple “Neuron” class with a display function.
The highlighted calculation inside of “fill()” scales a number from +1 to -1 to a number 0 and 255. There is also an inversion to achieve a pen (black) and paper (white) effect. The firing neurons have a value close to +1 which we want to draw as black or a value of around 0 as black/grey depending on the strength.
Let’s also create a “Network” class which we’ll use to construct each layer. It has three functions:
- Network: This is the initialiser, in here we setup a network with a given number of inputs, in this case this will be 196 (one for each pixel).
- Respond: In here we respond to a card it gets shown, for now we’ll just copy the input of the card to the output of the card with no calculations.
- Display: This is used to draw our 196 neurons as a 14x14 grid
Putting everything together in our main tab gives us something that looks like this:
In this main tab we only used the “Network” class, is gets initialised in “setup()” with 196 inputs (14x14). We then feed the testing set with a random “cardNum” which gets generated with a mouse press.
The full code to load and display the data can be found on GitHub. Just remember to include the data, which can be added by dragging it onto your Processing sketch.