Derivation of the Binary Cross-Entropy Classification Loss Function
Derive the log loss function used in machine learning tasks
This article demonstrates how to derive the cross-entropy log loss function used in machine learning binary classification problems.
The loss function is minimised using gradient descent, and network weights are updated through backpropagation.
The cross-entropy loss function is a composite function. Therefore, this article also demonstrates how to use the chain rule to find the partial derivatives of a composite function.
A composite is formed of one or more functions, as shown in Equation 1. Two functions, f and g, comprise y.
Equation 2 is another composite, L(a, y). There are two variables, a and y.
y is a constant, while a is another function dependent on z, as shown by Equation 3.
Note that e is not a variable; it is Euler’s number, a transcendental constant approximately equal to 2.71828.
Furthermore, z is a function of w, x and b as defined by Equation 4.