Derivation of the Binary Cross-Entropy Classification Loss Function

Derive the log loss function used in machine learning tasks

Andrew Joseph Davies
5 min readNov 13, 2021

This article demonstrates how to derive the cross-entropy log loss function used in machine learning binary classification problems.

The loss function is minimised using gradient descent, and network weights are updated through backpropagation.

The cross-entropy loss function is a composite function. Therefore, this article also demonstrates how to use the chain rule to find the partial derivatives of a composite function.

Photo by Marius Masalar on Unsplash

A composite is formed of one or more functions, as shown in Equation 1. Two functions, f and g, comprise y.

Equation 2 is another composite, L(a, y). There are two variables, a and y.

y is a constant, while a is another function dependent on z, as shown by Equation 3.

Note that e is not a variable; it is Euler’s number, a transcendental constant approximately equal to 2.71828.

Equation 3 — a is a function of z (Image By Author)

Furthermore, z is a function of w, x and b as defined by Equation 4.

--

--

Andrew Joseph Davies

I’m Andy! I have interests in maths and engineering. I’ll be writing about some small projects as I learn new things. 1x Top Writer in Science 🚀