Learn to correctly interpret the coefficients of Logistic Regression and in the process naturally derive its cost function — the Log Loss!
Models like Logistic Regression often win over their complex counterpart models when explainability and interpretability are crucial to the solution. Despite this, unfortunately, Logistic Regression coefficients are not so easy to interpret as the usual Linear Regression coefficients.
Imagine choosing Logistic Regression for sole reasons of explainability yet presenting wrong descriptions to the business stakeholders. Ouch, not a pleasant scenario definitely!
In this blog, I have described how we can derive the interpretation of logistic regression coefficients naturally so that there is no need to remember the ugly terminologies!
Interpreting Model Coefficients
- Let’s start with what is known to us, the linear regression equation:
y = θ0 + θ1X1 + θ2X2 + θ3X3 + ….. + θnXn
However, with Logistic Regression our aim is now to predict a class probability value (rather than a real continuous y value as we did for linear regression). Hence, we need a way to restrict the range of y to [0,1] (instead of the original range (-∞, +∞) ).
2. A very nice function that converts any value in the range (-∞, +∞) to [0,1] is the Sigmoid function (𝜎). Let’s make use of that.
Applying sigmoid on both sides:
𝜎(y) = 𝜎(θ*X)
Therefore, predicted probability p = 𝜎(θ*X)
So far, so good. We now have a linear model that can predict a class probability given a set of features X and their weights θ. This is what Logistic Regression does, but with a few more changes.
Wondering what are they? Read ahead to know more.
3. ML models explainability is crucial for businesses. Recall from linear regression on how we interpret its coefficients: “How much does the output (dependent) variable y change for 1 unit change in the predictor (independent) variable x, given all the other predictor variables are held constant?” ….. Easy, right?! 😄
Now we are interested in interpreting the coefficients for logistic regression. Unfortunately, our coefficients are currently wrapped inside the sigmoid function making it difficult to frame our interpretation: “How much does the output (dependent) variable y change for 1 unit sigmoid change in the predictor (independent) variable x, given all the other predictor variables are held constant?” ….. Sounds weird, right?! (What is 1 unit sigmoid change 😫)
We would definitely like to simplify this. And this is where the logit function comes to our rescue! ☮️
Logit and sigmoid are inverses of each other.
4. Applying the logit function on both sides: logit(p) = logit(𝜎(θ*X))
By definition, logit(x) = log(odds) = log(x/1-x)
Canceling logit and sigmoid with each other gives us:
log(p/1-p) = θ*X
Bingo! We have freed our coefficients from the sigmoid trap.
Now we can interpret our coefficients (in the same manner as linear regression) as: “How much do the log odds of belonging to a class change for 1 unit change in the predictor (independent) variable x, given all the other predictor variables are held constant?”….. Yayyy! 😄
5. Performing the above activity has also led us to derive our pretty cost function for logistic regression — The Log Loss (or Cross-Entropy Loss)!
log(p/1-p) = log(p) - log(1-p) where p = 𝜎(θ*X)
In its general form, cross-entropy is written as:
Cross Entropy = - Σ yi * log(pi)
Thank you for investing your time in reading this article!