Backpropagation

Gary(Chang, Chih-Chun)
Deep Learning#g
Published in
2 min readMay 1, 2018
https://www.youtube.com/watch?v=Ilg3gGewQ5U

We use gradient descent to find the best parameter set to minimize the cost, so it is necessary to compute the gradients of the cost function on the different weights and bias. Backpropagation is an effective way to calculate the gradients.

how it works?

The concept of backpropagation originates from the chain rule. By calculating the separated partial derivaives and then multiplying each, the result is obtained. Therefore, the gradient of the cost function can be written as:

First, we analyze the first term, 𝜕z/𝜕w, by considering which layer is:

So, we know that we first term, 𝜕z/𝜕w, is the output of previous layer. How about the second term?

From the derivation above, the second term, 𝜕C/𝜕z, can be obtained if we know the next layer’s second term. If we are able to calculate the last layer(output layer)’ s one, we can get the first layer’s. There is the key of “backpropagation”. Here show the process:(the activation function is sigmoid function, and 𝜕C/𝜕y depends on the definition of the cost function)

Based on these equations, we obtain a backward netork.

Reference:

http://speech.ee.ntu.edu.tw/~tlkagk/courses_MLSD15_2.html

https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/

If you like this article and consider it useful for you, please support it with 👏.

--

--