Using Mean squared error loss (MSE) in Logistic Regression?

Pranav Kushare
3 min readNov 13, 2022

I have been revisiting some algorithms/concepts that were on the verge of vanishing from my memory :) while revising the logistic regression I realized that I never focused on some minute details. Why don’t we use MSE loss while training a logistic regression model? After googling across different sources here’s what I understood ==>

Before diving into the answer let’s revise some concepts

As you might know normally LogLoss also known as CrossEntropy Loss is used in training a logistic regression classification model. The formulas for the same is as below

In this y is the Actual value or the label

And p is the predicted value by model.

This logloss is a convex function which means there exists only one local minima in the graph. And as with any other ML algorithm Gradient descent is used for optimization which is finding the best value of coefficients so that the value of the cost function is lowest. If you remember one of the main conditions for gradient descent is that the graph should have only one local minima on which it is iterating and trying to find optimal values.

Lets see this by plotting the log loss function. Breaking complicated things into smaller parts always makes it easier and quicker to understand. So let’s break things logloss function into parts. Depending on the value of yi it would be like this==>

--

--