Semih Gülüm
Deeper Deep Learning TR
7 min readMar 16, 2021

--

Step by step Forward and Back Propagation

This article was written within the scope of the studies we carried out at AdresGezgini R&D Center.

In this article, I will try to explain to you how the forward and backward propagation, which is often used in deep learning, is realized. You will soon realize that in many places we perform forward propagation without even realizing it. But first, I would like to talk about a rule that is important for propagations and brings us back to high school years. The chain rule, aka “The Chain Rule”.

Chain Rule

To simplify the problem, let’s turn it into a real-world problem. Assume that 3 people in our dataset. These are Ahmet, Buket and Yavuz. According to the data we have, Buket’s estimated walking speed is 2 times that of Ahmet, while it is 2/3 of that of Yavuz. Let’s have the graphs of these:

As can be seen if we look at the graphs carefully, if we have Yavuz’s speed, we can use it to find Ahmet’s speed using Buket’s speed. For example, let Yavuz’s estimated speed be 6 units. Let’s first find the derivatives of everyone’s relative speeds:

Then, with the help of the derived derivative, let’s find Buket’s speed according to Yavuz’s given speed:

Now we have reached the estimated speed of Buket. Now, let’s try to get from Buket’s estimated speed to Ahmet’s estimated speed:

Finding Ahmet’s speed according to Yavuz’s speed

The only thing that changes in Neural Networks is not the speed of the people, but the derivatives of the nodes with respect to each other.

Forward Propagation

Forward feed in an example neural network structure

In the Neural Network, our journey starting from the input to the output is called the forward direction. The weights entering each node are multiplied with the available value (if input is x feature value, if hidden layer is the value from the sum of previous multiplications entering that node) and bias is added. This multiplication operation is called the “dot product”, and that’s because we’re dealing with vectors.

Now we can move on to our numerical example. Let’s go through a simple example and focus on the operations rather than the complexity of the model. There are two input neurons (node), two hidden neurons, and two output neurons. In addition to these, the hidden layer and the output layer contain two biases. Let’s use the “sigmoid” activation function in the hidden layer.

  • i is abbreviated to represent the input layer
  • h is abbreviated to represent the hidden layer
  • o is abbreviated to represent the output layer.
Example with parameter values

Let’s calculate h1 first. While doing this, it is seen that there are i1, i2 and b1 nodes entering the h1 node. Therefore:

Finding h1

Then let’s find h2:

Finding h2

Now we have calculated our hidden layers. The queue is in the output layer. Let’s start with o1 first:

Finding o1

Finally, let’s calculate o2:

Finding o2

Since we have calculated our outputs, it is time to find the loss function value. Let’s say our target values are both 0.15:

Total loss of forward feed

NOTE: It is seen that out_o1 and out_o2 are the estimation outputs of the model we have established. So we can actually call them ŷ_o1 and ŷ_o2.

Backward Propagation

This algorithm is called backpropagation because it tries to reduce errors from output to input. It looks for the minimum value of the error function in the weight field using a technique called gradient descent.

With this method, we will try to reduce the error by changing the weight and bias values. Since we will do a single step and show the weight update, let’s take the learning rate as a big value like 0.5 to make a little bigger progress and start back propagation. Let’s choose w5 as the node whose weight we want to update and let’s do the operations on it. (All weights are updated with this method, but I chose w5 for illustration.)

Backpropagation is the core of neural network training. It is a method of adjusting the weights of a neural network based on the loss value obtained in the previous epoch. Correctly adjusting the weights allows us to reduce the error rate and increase its generalization, making the model reliable.

Instead of going directly to the formula, let’s look at the above network again. This will guide us in a way to implement Chain Rule. The value taken from Loss is given back as input and affects o1’s sigmoid state, which in turn affects o1 before sigmoid and o1, w5 as the last link of the chain:

NOT: The meaning of the above equation is to take the derivative of E_total according to the weight of w5. In other words, we can also say “calculation of gradient descent according to w5”.

Let’s consider all three derivative operations above one by one. The formulas of these variables are the same as in forward propagation, but it will be enough to take the derivative and substitute it:

Since we took the derivative with respect to o1, the part with o2 behaved like a constant and was equal to 0.

After that,

Finally, let’s take the derivative of o1 with respect to w5:

→ Again, the parts with o2 were equal to zero.

NOT: Do not forget that the parts with o1 will be zeroed when the o2 parts zeroed here are differentiated according to the o2.

Now that we have the derivatives, we can apply the chain rule by multiplying them:

As a final step, we can now update the w5 weight. While doing this, we multiply the derivative we found from the previous weight with a fixed learning rate that we determine:

Updated w5 weight
How to implement propagations through a different sample application [https://www.programmersought.com/article/45312377380/]

NOTE: Forward Propagation and Backward Propagation are linked.

It’s code time!!

Let’s try the hand-calculated example only through the code written with the help of numpy:

When we examine the output of the code, we can see that it is the same as the values we calculated above.

Finally, I leave a very nice demo that was made before to reinforce the clarity of the subject.

If you want to read our other articles, you can find them here. Stay well!

References

--

--

Semih Gülüm
Deeper Deep Learning TR

Data Scientist at Accenture || Data Science MSc. Student at Sabanci University || Writing articles on Data Science & Deep Learning