Neural Network 05 — Gradient Descent for Neural Networks

3 min readNov 3, 2023

Welcome to the lesson 05 😃. If you were following my series of lessons you have learnt many important components your journey towards studying Deep Learning. If you have missed those lessons or if you are interested to visit them, following links will take you to the relevant lessons.

If you are good to go, let’s get started! 🙌

Understanding gradient descent of a neural network is very important in terms of understanding the concept of back propagation.

Let’s consider a Neural Network with one hidden layer for now.

Now we are slowly moving into programing part of the lessons. Understanding of these basics will be vital for implementing a good Python program.

Formulas for computing derivatives

dw[1], db[1], dw[2], db[2]

Always good to remember:
Forward Propagation is calculating the output of the network.
We have the Cost Function in between.
Back Propagation is the gradient descent algorithm which computes derivatives.
Then we have everything for updating parameters w and b

Random initialization

As we saw several times before where I described forward prop and backward prop algorithms, you might noticed that I have initialized parameters to 0. But the best way is initialize weights randomly. Here, I will explain.

In neural network initializing weights randomly is very important for gradient descent.

What happens if we initialize weights to zeros?

So, the every hidden unit computes the same values and outputs the same values.
Therefore, no point of keeping more than one hidden units.

In neural network we need to compute different functions in different hidden units. This can be achieved by initializing weights randomly.

In Python we can initialize w and b randomly as follows.

The 0.01 value is for making W[1] and W[2] small. Because, when we use Sigmoid/tanh activation function z need to be small for faster learning.

Well, this is a quite short lesson. But, it is a very important topic in Deep Learning. Back propagation is a very complex when it comes to implement programmatically. Fortunately many Deep Learning libraries such as TensorFlow, Keras are capable of doing back propagation automatically. We only need to implement the forward propagation. Lucky!!! 😁

OK then. This is the end of the lesson 05. See you in the next lesson.

Good Luck!!! Keep Learning!!! 🎯

Neural Network 05 — Gradient Descent for Neural Networks

Formulas for computing derivatives

Random initialization

What happens if we initialize weights to zeros?

Written by Tharanga Nandasena