Understanding Logistic Regression

Published in

Nerd For Tech

7 min readApr 23, 2021

So far, we either looked at estimating the conditional expectations of continuous variables (as in regression). However, there are many situations where we are interested in input-output relationships, as in regression, but the output variable is discrete rather than continuous.

In particular, there are many situations where we have binary outcomes (there are only two possible outcomes to a certain situation). In addition to the binary outcome, we have some input variables, which may or may not be continuous.

How could we model and analyze such data? We could try to come up with a rule which guesses the binary output from the input variables. This is called classification and is an important topic in statistics and machine learning.

Classification predicts a discrete target label Y. Classification is the problem of assigning new observations to the class to which they most likely belong, based on a classification model built from labeled training data.

The accuracy of your classifications will depend on the effectiveness of the algorithm you choose, how you apply it, and how much useful training data you have.

Logistic Regression

Logistic regression is a method of classification in which the model outputs the probability of a categorical target variable Y belonging to a certain class.

In other words, Logistic regression is a method used to predict a dependent variable, given a set of independent variables, such that the dependent variable is categorical.

Dependent variables (Y): The response binary variable holding values like 0 or 1, Yes or No.
Independent variable (X): The predictor variable used to predict the response variable.

Basically, the outcome in logistic regression is categorical. So when the outcome has two possible values it is always desirable to have a model that predicts either 0 or 1 or a probability score between 0 and 1.

Though logistic regression is often used for binary classification where there are two classes, keep in mind that classification can be performed with any number of categories (e.g. when assigning handwritten digits a label between 0 and 9). But we use logistic regression for binary classification only.

A good example of a classification problem in which we can apply logistic regression is determining whether a loan application is fraudulent.

Ultimately, the lender wants to know whether they should give the borrower a loan or not, and they have some tolerance for the risk that the application is in fact fraudulent.

In this case, the goal of logistic regression is to calculate the probability (between 0% and 100%) that the application is fraud. With these probabilities, we can set some threshold above which we’re willing to lend to the borrower, and below which we deny their loan application or flag the application for further review.

Why not linear regression for predicting probability?

For categorical variables, it is inappropriate to use linear regression because the response values are not measured on a ratio scale and the error terms are not normally distributed.

In addition, the linear regression model can generate predicted values as any real number ranging from negative to positive infinity, whereas a categorical variable can only take on a limited number of discrete values within a specified range.

In short, If you trained a linear regression model on a bunch of examples where Y = 0 or 1, you might end up predicting some probabilities that are less than 0 or greater than 1, which doesn’t make sense. This is because linear regression works on continuous response variables.

So we can’t make use of linear regression. Instead, we’ll use a logistic regression model designed to assign a probability between 0 and 1 indicating that Y belongs to a certain class.

Maths behind Logistic regression

The math in this section is interesting but might be on the more technical side.

Logistic regression (logit model) is a modification of linear regression that makes sure to output a probability between 0 and 1 by applying the sigmoid function, which, when graphed, looks like the characteristic S-shaped curve.

Sigmoid function, which squashes values between 0 and 1 is given as

Before moving any further let us understand what are odds and how they differ from probability.

The Odds are the ratio of something happening to something not happening. Probability is the ratio of something happening to everything that could happen.

Consider an example, The odds of our team winning a game be 3 to 5 (ratio of number of games won to number of games lost) then the probability of your team winning is 3 to 8 (ratio of number of games won to the total number of games played)

To calculate odds from probability we have

Where,

p is the probability of an event occurring in our case probability of winning

In probability there is an upper bound i.e probability of an event cannot be greater than one. But when transforming the probability to odds removes the upper bound. But what about the lower bound of probability. Taking the logarithm of the odds removes the lower bound.

Also one of the reasons to take logarithmic odds is that the odds of your team winning can go from 0 to 1. But the odds of your team losing can go from 1 to infinity. Because of these asymmetries between the values of odds we take logs. Taking log() of the odds solves the problem by making everything symmetrical.

The log of the ratio of the probabilities is called the logit function and forms the basis for logistic regression.

If you are not familiar with linear regression, check out the following article Linear Regression in a Nutshell

Recall the original form of our simple linear regression model, which we’ll now call g(x) since we’re going to use it within a compound function:

Where,

β0 is the y-intercept
β1 is the slope of the line
x is the value of x co-ordinate
y is the value of the prediction

Now, to solve this issue of getting model outputs less than 0 or greater than 1, we’re going to define a new function F(g(x)) that transforms g(x) by squashing the output of linear regression to a value in the [0,1] range.

To do so we plug g(x) into the sigmoid function above, resulting in a function that outputs a probability between 0 and 1:

In other words, we’re calculating the probability that the training example belongs to a certain class: P(Y=1).

Derivation of the above equation

The linear equation is given as

Now, we predict the odds of winning

Exponentiating both sides

Now let,

Then,

In the logit model, β1 now represents the rate of change in the log-odds ratio as X changes. In other words, it’s the “slope of log-odds”, not the “slope of the probability”.

Maximum Likelihood Estimation

The Maximum Likelihood Estimation (MLE) is a method of estimating the parameters of a logistic regression model. This estimation method is one of the most widely … [read more]

Understanding the output of a logistic regression model

As mentioned earlier, Logistic regression only calculates the outcome as either 0 or 1. So why do we have a curve from 0 to 1? It should be a straight line at 0 and 1. This is because logistic regression calculates a probability.

Consider an example where the probability of an event occurring is 0.6, does this output belong to class 0 or 1.

In this situation, a threshold value is set. Consider a threshold value to be 0.5. This means that any value in the range of 0 to 0.5 is classified as 0 and 0.5 to 1 is classified as 1. This will get a binary result even though the outcome is continuous.

In short, To predict the Y label — spam/not spam, cancer/not cancer, fraud/not fraud, etc. — you have to set a probability cutoff or threshold.

Linear regression vs Logistic regression

Conclusion

Logistic Regression produces results in binary format which is used to predict the outcome of a categorical dependent variable. As logistic regression predicts probabilities, rather than just classes, we can fit it using likelihood.

Thanks for reading this article! Leave a comment below if you have any questions. Be sure to follow @ArunAddagatla, to get notified regarding the latest Data Science and Deep Learning articles.

You can connect with me on LinkedIn, Github, Kaggle, or by visiting Medium.com.