# For Dummies — The Introduction to Neural Networks we all need ! (Part 2)

This article is in continuation to the Part1 of this series. If you have not yet read it, I highly recommend you to do that before we dive into multi layered neural networks here!

Just as a recap, I will quickly go through what a single layered neural network basically does. Once a training sample is feeded to the network, each output node of the single layered neural network (also called Perceptron) takes a weighted sum of all the inputs and pass them through an activation function (probably sigmoid or step) and comes up with an output. The weights are then corrected using the following equation,

For all inputs i,
W(i) = W(i) + a*g’(sum of all inputs)*(T-A)*P(i), where a is the learning rate and g’ is the derivative of the activation function.

Note: We drop the derivative function in case the activation function is a step function.

This process is repeated by feeding the whole training set several times until the network responds with a correct output for all the samples. The training is possible only for inputs that are linearly separable. This is where multi-layered neural networks come into picture.

### What are multi-layered neural networks?

Each input from the input layer is fed up to each node in the hidden layer, and from there to each node on the output layer. We should note that there can be any number of nodes per layer and there are usually multiple hidden layers to pass through before ultimately reaching the output layer.

But to train this network we need a learning algorithm which should be able to tune not only the weights between the output layer and the hidden layer but also the weights between the hidden layer and the input layer.

### Enters Back Propagation!

First of all, we need to understand what do we lack. To tune the weights between the hidden layer and the input layer, we need to know the error at the hidden layer, but we know the error only at the output layer (We know the correct output from the training sample and we also know the output predicted by the network.)

So, the method that was suggested was to take the errors at the output layer and proportionally propagate them backwards to the hidden layer.

Below we will write equation for a 2 layered network but the same concept applies to a network with any number of layers.

We will follow the nomenclature as shown in the above figure.

For a particular neuron in output layer
for all j{
Wj,i = Wj,i + a*g’(sum of all inputs)*(T-A)*P(j)
}

This equation tunes the weights between the output layer and the hidden layer.

For a particular neuron j in hidden layer, we propogate the error backwards from the output layer, thus

Error = Wj,1 * E1 + Wj,2 * E2 + ….. for all the neurons in output layer

Thus,

For a particular neuron in hidden layer
for all k{
Wk,j = Wk,j + a*g’(sum of all inputs)*(T-A)*P(k)
}

This equation tunes the weights between the hidden layer and the input layer.

So, in a nutshell what we are doing is

• We present a training sample to the neural network (initialised with random weights)
• Compute the output received by calculating activations of each layer and thus calculate the error
• Having calculated the error, we readjust the weights (according to the above mentioned equations) such that the error decreases
• We continue the process for all training samples several times until the weights are not changing too much