A Neural Network is a computerized system that loosely models the brain. You could think of it as a network of neurons that takes an input and produces an output.
Now imagine we have a 28 x 28-pixel image that take handwritten images and classifies which number they are. We can see that the number below is a 9, but how can the neural network “learn” to classify these numbers.
In this image every pixel corresponds to a single neuron. Each of the 784 neurons has a grayscale value between 0 and 1. Where 0 is black and 1 is white, and anything is between 0 and 1 is a shade of gray. This is also referred to as its activation.
We will be using an input layer, an output layer, and 2 hidden layers. The reason they are called hidden layers is that they are not directly observable from the input and output layers.
This final layer is what our system thinks the value of the handwritten digit is. In the beginning, it is consistently wrong because it has no idea how to classify number. As our model receives more training data it improves its ability to recognize handwritten digits.
The heart of the neural network is that each layer affects the activation of the neurons in the next layers. Each of these yellow lines we will assign a weight it. The weight is a numerical value to represent that the strength of the connections between neurons from one layer to another.
Now we take the activation and the weights of each neuron and compute its weighted sum. Which looks something like this.
What it means is that take the weight (the strength of the connection) and multiply by the activation (the grayscale value) of the input neurons. We do that with all 784 neurons and we get a weighted sum for a single neuron in the next layer.
As a result, you might come out with a number outside the range of 0 and 1. The problem with the weight sum is that it is usually out the range between 0 and 1 which makes does not work because our activations only between 0 and 1.
A common function that takes our weighted sum value and turns it into a number between 0–1 is called a sigmoid function, or a logistic curve. Where our very negative weights end up close to 0 and our very positive values end up close to 1. The sigmoid function is said to be an outdated function and Rectified Linear Unit (ReLU) is more common because it was easier to train your model on it. So the activation of the neuron in the next layer is essentially the measure of how positive the relevant weight sum is.
Now, what if you don’t want just any neuron above 0 to light up or activate? what if you want a specific value before it neuron gets meaningfully active. For this, we have a bias that acts as a neuron threshold. If a neuron does not have a value above the threshold it won’t activate at all.
This bias is applied to our weighted sum before we apply the sigmoid function to reduce its value to be in the range of 0 to 1.
With this hidden layer of 16 neurons we have 784 neurons x 16 weights with 16 biases. All of that is just the connections between the first and second layers. When you count the other layers you will have a total of 13,002 weights and biases. All of which can be tweaked and turned to make this network behave in different ways.
When we talk about network learning then it is about finding the right weights and biases to find the right mix of all of them to classify our input properly.
With this knowledge, we can think of a neuron as a function that takes the outputs of the previous layer and creates an output of a number between 0 and 1.
- Neural Network is a network of neurons.
- Each neuron has a value called an activation that is between 0 and 1.
- The connection between a neuron from 1 layer to another is called a weight.
- When you take the activation and weights from each neuron in 1 layer you produce a weighted sum that affects the activation of the neuron in the next layer.