An Intuitive Guide To Understanding The Learning Process Of A Neural Network
Artificial neural networks are one of the most widely used methods in machine learning. And one of the most interesting things about a neural network is the way it learns about the data it’s been trained on. It first starts by learning simple patterns in the data and then proceeds to learn more complex attributes.
I decided to write this article after taking a class on neural networks and reading lots of articles about it. Even though I understood the structure of a neural network, and the process involved in adjusting the weights required to make proper predictions, it wasn’t still clear to me why it worked the way it did. I wanted to be able to explain why and how the foremost layers in a network are able to discover simple attributes from a data set, and layers closer to the output layer can learn more complex attributes(which are combinations of attributes learnt from previous layers).
In order to explain the intuition behind the learning process, I will only be making use of the simple case of a Hand Written Digit Recognition System. But first, I will like to provide a brief summary of what a neural network looks like and how it works.
What Is A Neural Network?
A neural network is composed of a set of layers which are characterized as Input, Hidden (can be more than one) and Output as shown in the diagram. And each layer in the network is made up of neurons, which helps in providing the connections between the layers that make up the network.
The value assigned to each neuron in a layer is the weighted combination of the values assigned to neurons from previous layers. So just as shown in the diagram, the first neuron in the hidden layer is derived by summing up the product of the values of each neuron from the input layer by designated unique weights and the same goes for the second neuron in the hidden layer and other neurons as well.
How Does A Neural Network Work?
When working on a machine learning problem (supervised learning problem to be precise), we collect data and try to identify the features from the data we collected.
A simple explanation of how a neural network is used for this case is as follows: The input layer holds the data that we collected, the hidden layer learns about that data, and the output layer provides predictions.
Also, the number of neurons in the input layer usually represents the number of features identified from the data collected, while the number of neurons in the output layer corresponds to the number of possible outcomes or categories that can be derived from the data-set.
The neural network basically undergoes two processes that are repeated over and over again until it makes correct predictions about the data.
Feed Forward Process
The feed forward algorithm is the first step that is executed by the neural network during the learning process.
Within this step, the value assigned to each neuron within a layer is derived by applying an activation function to the weighted sum of neurons from the previous layer as shown in the diagram.
The diagram above only illustrates the feed forward process for two layers (input and hidden layer). In this illustration, we can see how neurons are connected from the first layer to the second layer.
And for each neuron in the first layer, an associated weight is applied or multiplied to it as it connects to neurons in the second layer.
The values assigned to neurons in the second layer are simply gotten by multiplying weights against values in the input layer neurons, summing them up, and then passing the resulting value to a neuron in the second layer
Even though in the beginning, the weights assigned to the neurons are picked at random, they are actually meant to give importance to the neurons they are associated with.
In reality there will be an output layer and probably more hidden layers, and the values for the hidden layer(s) and output layers will be derived in the same way (output layer neurons will be derived by an activation function applied to the weighted sum of neurons from the preceding hidden layer).
Back Propagation
At the end of a feed forward process, the neural network makes predictions that can be gotten from the output layer, and usually there are errors in the predictions .
Back propagation is used to reduce that error as much as possible so that the predictions are closer or equal to the actual values.
As the name implies, back propagation entails reducing the error by going backwards through the network from the output layer through the hidden layers and back to the input layer.
It is important to note that each step in going from one layers to the other in the feed forward process has an associated error, and each error is as a result of wrong weights that were chosen while moving across a layer.
At the end of a feed forward process, the error in prediction is basically dependent on all the errors that has been observed while moving across layers.
Thus, when back propagation is carried out in a neural network, it ensures that the error associated with each layer is reduced by adjusting the weights. Once the weights are adjusted, the feed forward process can be carried out again with the newly updated weights.
These processes of executing the feed forward algorithm and then back propagation can be carried out several times until the error produced by the network is minimal.
How Does A Neural Network Learn?
Now let’s look at the process by which a neural network learns from a data-set of images that contains hand written digits.
Let’s consider the following hand written digit as shown by the figure below:
This image is a 28 x 28 pixel grey scale image with pixel intensity values ranging from 0(white) to 1(black) and values in between 0 and 1 corresponding to different shades of grey. The image can also be represented as a 28 x 28 dimensional matrix were each value in the matrix is the same as the corresponding pixel intensity value in the image.
Also let’s assume we will be using a three-layered neural network with the first layer as our input layer followed by one hidden layer and then followed by a output layer as shown in the figure below:
To use a 28 x 28 pixel image as training data to the network described above, the image would need to be converted to a one dimensional array that has a size equal to the number of neurons in the input layer. This means that each neuron in the input layer would be assigned a value from a corresponding item in the array.
Thus the 28 x 28 pixel image when converted would result to a 1 dimensional array that has 784 items, were the value for each item in the array is equal to the pixel intensity value for the pixel it represents.
For this network, if the first neuron of the output layer has an output ≈1, while other neurons outputs a 0, then that would indicate that the network predicts that the provided handwritten digit is 0. If the second neuron has an outputs ≈1, then that will indicate that the network thinks the provided digit is a 1 and so on.
As usual the learning process starts by executing the feed-forward process, and it is important to note that one of the most important variable in the feed forward process is the weights that are assigned to neurons, because they determine how correctly the network would make predictions
How Weights Can Affect Predictions In The Learning Process
Let’s consider the connections between neurons from the input layer to the first neuron in the hidden layer for a feed forward process. And for this, let’s say that the the first neuron had a value of 0.8 which represents a dark sport on the part of the image it corresponds to.
Assuming a weight of 0.9 was assigned to this neuron, then when that weight is multiplied by 0.8, the resulting value is 0.72, which is not so far from the original value for that neuron(still a dark spot). Thus when this neuron is supplied to the hidden layer, it automatically learns that this neuron represents a dark spot on the image.
If the weight assigned to the neuron was say 0.1, then the resulting value would be 0.08, clearly this means the hidden layer will assume that the neuron represents a white or lighter spot which obviously is a wrong assumption.
Thus it can be clearly seen how choosing wrong weights can alter the predictions of the neural network.
A Close To Perfect Neural Network
For the sake of understanding the whole learning process, we will assume our neural network has a systematic way of choosing the weights such that if we divide the image into seven equal parts, then those parts can be distinctly identified by the seven neurons in the hidden layer respectively.
Let us consider only the connections from the input layer to the first neuron in the hidden layer as shown in the diagram above.
If we chose correct weights for the first 112 set of input neurons such that they did not alter the values of the neurons in a reversed manner(meaning if any of the neurons in this set represents a dark spot, then the hidden layer would definitely see it as dark spot).
And considering that this 112 neurons corresponds to the upper part of the image as shown in the image below:
Then by intuition, this means that the first neuron in the hidden layer will properly learn or identify this part of the image.
However if the remaining set of neurons in the input layer were assigned wrong weights, then the first neuron in the hidden layer may not be able to properly identify parts of the image they represent.(meaning if any of the neurons in this set represents a white spot, then the hidden layer might see it as a dark spot)
If we were to only consider the connections between the input neurons and the second neuron in the hidden layer. Also assume that the 113th —224th neurons (i.e the next 112 neurons corresponding to next 112 pixels of original image) from the input layer were given correct weights while other neurons were given wrong weights for this connection.
Then, just like how the first neuron of the hidden layer was able to identify a part of the image from the the 1st set of 112 neurons, likewise the second neuron of the hidden layer can also identify another part of the image from these 2nd set of 112 neurons.
If the weights are systematically chosen this way for connections between subsequent 112 set of input neurons to subsequent hidden layer neurons, then we can see how the 7 different neurons of the hidden layer will be able to identify the 7 different parts(attributes) of the original image as shown below.
And furthermore, if we were to add new hidden layers before the output layer in our network, then these new hidden layers will be able to discover or learn more complex attributes or parts of the original image because they will be a combination of attributes learnt from previous layers.
It is important to note that in a real scenario, we will hardly get this kind of structured variation in the weights that are applied to neurons since they are chosen randomly. That is why a second process is involved in training the network, which we refer to as back propagation.
Thus, we can say that by applying back propagation, we are simply trying to adjust the weights to get close to or to be similar to the values that are structured in the way we described above.
Conclusion
I’ll conclude this article by saying that it was meant to provide a personal understanding about the learning process of a neural network and as such is open to suggestions, criticism and applauds as well.
If you find this post useful, don’t forget to clap back and watch out for more interesting posts from Axum Labs