Let’s Deep Dive into Logistic Regression

Nuzulul Khairu Nissa
Geek Culture
Published in
9 min readApr 4, 2021

Logistic Regression is a Machine Learning classification algorithm that is used to predict the probability of a categorical dependent variable. It’s an extension of the linear regression model for classification problems. Unlike linear regression which outputs continuous number values, logistic regression transforms its output using the logistic sigmoid function to return a probability value which can then be mapped to two or more discrete classes.

In this section we’re trying to learn more about :

  1. The Coefficients of Logistic Regression
  2. The Maximum Likelihood of Logistic Regression

The Comparison between Logistic Regression and Linear Regression

Given data on time spent studying and exam scores. Linear regression and logistic regression can predict different things:

  • Linear Regression could help us predict the student’s test score on a scale of 0–100. Linear regression predictions are continuous (numbers in a range). In linear regression, we fit the line using “least squares”. We find the line that minimizes the sum of the squares of the residuals.
  • Logistic Regression could help us predict whether the student passed or failed. Logistic regression predictions are discrete. We can also view probability scores underlying the model’s classifications.
  • Logistic regression predicts whether something is True or False, instead of predicting something continuous like size.
  • Also, instead of fitting a line to the data, logistic regression fits an “S” shaped “Logistic Function”. The curve goes from 0 to 1 and that means, the curve tells us the probability that an object from figure above is obese or not, based on its weight.
  • Although logistic regression tells the probability that an object is obese or not, it’s usually used for classification. For example, if the probability of an object is > 50%, then we’ll classify it as obese, otherwise we’ll classify it as “not obese”.

The step of Logistic Regression

1. Logistic Regression : Coefficients (Continuous Variable)

Part 1, we’ll start by talking about logistic regression when we use a continuous variable (like weight) to predict obesity. The y-axis in the logistic regression is confined to probability values between 0 and 1. The y-axis in logistic regression is transformed from the “probability of obesity” to the “log(odds of obesity)” so, just like the y-axis in linear regression, it can go from -infinity to +infinity.

Let’s transform this y-axis from a “probability of obesity” scale to a “log(odds of obesity)” scale, below:

p, in this case, is the probability of an object being obese and corresponds a value on the old y-axis between 0 and 1.

If we plug p=0.88 into the logit function and do the math, we get 2 on the new y-axis.

The new y-axis transforms the squiggly line into a straight line.

The important thing to know is that even though the graph with the squiggly line is what we associate with logistic regression, the coefficients are presented in terms of the log(odds) graph.

The first coefficient, Estimated Intercept = -3.48 is the y-axis intercept when weight = 0, it means that when weight = 0, the log(odds of obesity) are -3.48. The Standard Error = 2.364 for the estimated intercept.

The Z value = -1.471 is the estimated intercept divided by the standard error. In other words, it’s the number of standard deviations the estimated intercept is away from 0 on a standard normal curve. Since the estimate is less than two standard deviations away from zero. We know it is not statistically significant.

The second coefficient is Slope = 1.83. It means for every one unit of weight gained, the log (odds of obesity) increases by 1.825 ~ 1.83. The standard error = 1.088 for the slope.

The Z value = 1.678 is the number of standard deviations the estimate is from 0 on a standard normal curve, from that result we know if the estimate is less than 2 standard deviations from 0, so it is not statistically significant. (This is no surprise with such a small sample size).

And this is confirmed with the large p-value.

2. Logistic Regression : Coefficients (Discrete Variable)

Now let’s talk about logistic regression coefficients in the context of testing if a discrete variable like “whether or not an object has a mutated gene” is related to obesity.

This type of logistic regression is very similiar to how a t-test is done using linear models.

The first thing we do is transform the y-axis from the probability of being obese to the log(odds of obesity). Now we fit two lines to the data. For the first line, we take the “Normal Gene” data and use it to calculate the log(odds of obesity) for an object with the normal gene.

Thus the first (orange) line represents the log(odds of obesity) for the mice with the normal gene. Let’s call this the log(odds gene normal).

We then calcuate the log(odds of obesity) for the mice with the mutated gene. Thus the second (green) line represents the log(odds of obesity) for an object with the mutant gene. Let’s call this log(odds gene mutated).

These two lines come together to form the coefficients in this equation:

And since substracting one log from another can be converted into division, this term is log(odds ratio).

It tells us, on a log scale, how much having the mutated gene increases (or decreases) the odds of an object being obese. Let’s substitute in the numbers:

and that gives us these coefficients:

The first coefficient, Estimated Intercept = -1.50 is the log(odds gene normal) and the geneMutant term = 2.35 is the log(odds ratio) that tells you, on a log scale, how much having the mutated gene increases or decreases the odds of being obese.

The standard errors = 0.7817 for the estimated intercept and the standard errors = 1.0427 for the geneMutant.

The Z value = -1.924 (for the estimated intercept) tells us that the estimated value for the intercept, -1.5 is less than 2 standard deviations from 0 and thus not significantly different from 0 and this is confirmed by a p-value greater than 0.05.

The Z value = 2.255 (for the geneMutant), the log(odds ratio) that describes how having the mutated gene increases the odds of being obese is greater than 2, suggesting it is statistically significant and this is confirmed by a p-value less than 0.05.

3. Logistic Regression : Fitting a Line with Maximum Likelihood

Our goal is to draw the “best fitting” squiggle for this data. As we know, in logistic regression we transform the y-axis from the probability of obesity to the log(odds of obesity), see point 1 and 2 before.

The only problem is that the transformation pushes the raw data to positive and negative infinity and this means that the residuals (the distance from the data points to the line) are also equal to positive and negative infinity and this means we can’t use least-squares to find the best fitting line.

Instead, we use the Maximum Likelihood.

The first thing we do is project the original data points onto the candidate line. Then we transform the candidate log(odds) to candidate probabilities using this fancy looking formula.

For those at home keeping score, here’s how to convert the equation that takes probability as input and outputs log(odds), into an equation that takes log(odds) as input and outputs probability.

Now let’s see this fancy equation in action. For example we use the point= -2.1 (from the right side). We substitute -2.1 for the log(odds).

and that gives us a y-coordinate on the squiggle.

and we do the same thing for all the points.

Now we use the observed status (obese or not obese) to calculate their likelihood given the shape of the squiggly line.

Although it is possible to calculate the likelihood as the product of the individual likelihoods, statisticians prefer to calculate the log of the likelihood instead (because the squiggle that maximizes the likelihood is the same one that maximizes the log of the likelihood).

and this means that the log-likelihood of the original line is -3.77. Now we rotate the line and calculate its log-likelihood by projecting the data onto it transforming the log(odds) to probabilities.

rotate the line:

calculate its log-likelihood by projecting the data onto it and transforming the log(odds) to probabilities:

and then calculating the log-likelihood

And we just keep rotating the log(odds) line and projecting the data onto it and transforming it to probabilities and calculating the log-likelihood

NOTE : The algorithm that finds the line with the maximum likelihood is pretty smart each time it rotates the line, it does so in a way that increases the log-likelihood. Thus, the algorithm can find the optimal fit after a few rotations.

Ultimately we get a line that maximizes the likelihood and that’s the one chosen to have the best fit.

References:

If you want to learn more about R2 and p-values for Logistic Regression, you can watch this video!

--

--