If you say there’s now error function defined, then how did you derive this formula:

钱小谦

11

This is essentially what the backpropagation is (at least as I understand it — I may be wrong, feel free to offer your version of the formula).

To get parameter update, you multiply it by its partial derivative times what you deem to be a measure of total error times learning rate.

And substract it as a gradient descent step.