Unseen and in-depth understanding of Logistic Regression

Published in

VisionNLP

8 min readJan 22, 2023

Hi I’m Mayur Gargade, Working as Data Scientist at VisionNLP https://medium.com/visionnlp

To understand the Logistic regression algorithm you will need some background in the linear regression model which is the first model used to perform machine learning tasks, Please read my previous blog linear regression.

As we discussed in the previous blog linear regression is mainly used to analyze data where we have our response variable as numeric, also discussed what kind of problems we can solve using this algorithm. Now let’s imagine you’ve data, where the response variable is in categorical format, does linear regression will work? and why not? what other models can help to solve this kind of situation?

The answer is classification models can solve a problem where you’ve your response variable in categorical format, let’s understand the very first and yet simple algorithm of classification category.

What is Logistic Regression, When we can use this model?

Logistic Regression is a Machine Learning algorithm that is used for classification problems. Logistic regression predicts the output of a categorical dependent variable. Therefore, the outcome must be categorical. The simplest case is a binary classification. This is like a question that we can answer with either “yes” or “no.” We only have two classes: a positive class and a negative class. Usually, a positive class points to the presence of some entity while negative class points to the absence of it.

Why Logistic Regression instead of Linear Regression?

Probability ranges from 0 to 1 and linear regression value varies from -∞ to + ∞
We solve the problem by using odds instead of probability. If an event has a probability of p then it has odds p/(1-p).
While p varies from 0 to 1, p/(1-p) varies from 0 to ∞ and log(p/(1-p)) varies from — ∞ to + ∞.

Let's understand with an example;

Here we can see that if we have our outputs as 0 and 1 and if we plot it graph will always come like the above image, so with a linear regression model we won’t be able to find out our best-fit line, We already have studied various lines(curves) in our maths classes in academia. What do you think about the above data points, what kind of curve will go here? S-curve? Let’s see;

An S-shaped curve looks like the best-fit data curve, in order to use this type of curve in the mathematical formula we need to understand the logit function.

The most asked interview question is can we have linear regression when we have our response variable in binary format?
Ans: Technically we can use it with a few modifications along with a classification rule; Have a look at the following image to understand it.

Logistic Regression Formula

Types of Logistic Regression Tasks:

Binomial — In Binomial logistic regression only two types of dependent variables either 0 or 1.
Example: Will the student pass or fail in the exam

Multinomial — In Multinomial logistic regression there are 3 or more types of unordered dependent variables.
Example: using the medical history of the patient which drug will you suggest to heal diabetics?

Ordinal — In ordinal logistic regression also have 3 or more dependent variables but the type of variable is ordered. such as “low”, “Medium”, or “High”.

Logit Function:

All that means is when Y is categorical, we use the logit of Y as the response in our regression equation instead of just Y:

The logit function is the natural log of the odds that Y equals one of the categories. For mathematical simplicity, we’re going to assume Y has only two categories and code them as 0 and 1. This is entirely arbitrary–we could have used any numbers. But these make the math work out nicely, so let’s stick with them. P is defined as the probability that Y=1. So for example, those Xs could be specific risk factors, like age, high blood pressure, and cholesterol level, and P would be the probability that a patient develops heart disease.

Sigmoid Function:

The sigmoid function is a mathematical function that is used to map predicted values as a probability. The sigmoid function maps the value of the logistic regression is always between 0 and 1, which is not beyond this limit, Again in order to get values in 0 and 1 we need to form a curve like the “S” form. The S-form curve is called the Sigmoid function; The Threshold is used for the classification rule.

Cost function and Parameters of Logistic Regression:

The Cost Function is important because it gives us the errors of our predictions and subsequently, is needed for our learning algorithm. Concretely, we like to minimize the errors of our predictions, i.e, to minimize the cost function.

Unlike linear regression, coefficients of logistic regression is calculated using MLE(maximum likelihood estimation).
Therefore, an iterative process must be used instead.
This process begins with a tentative solution revises it slightly to see if it can be improved, and repeats this revision until improvement is minute, at which point the process is said to converge.
In some instances, the model may not reach convergence.
Non-convergence indicates that the coefficients are not meaningful because the iterative process was unable to find an appropriate solution.
A failure to converge may occur due to many reasons: having a large number of predictors to cases, multicollinearity, sparseness, or complete separation.

Why we don’t use the same cost function of linear regression in logistic regression?

To understand the answer to this question please have a look at the following image, Shortly what I’ve explained in the image is; to get the best parameters using the iterative method we usually use the gradient descent method instead of the squared error function because we want to achieve so-called global minima and if we use the non-convex function we don’t reach to global minima because we have multiple local minima’s and most of the time we stuck in local minima.

The individual impact of Independent Variables:

In linear regression models, you used the P-value to check whether an independent variable has a significant impact on a dependent variable. The beta coefficients in linear regression follow T-distribution, so you did a T-test to see the impact of each variable. Here in logistic regression, the beta coefficients follow the Chi-square distribution. So, the probability value (P-value) of the Chi-square test tells you about the impact of the independent variables in logistic regression models.

A Chi-square test in logistic regression tests the hypotheses here: H0: The independent variable has no significant impact on the dependent variable. H1: The independent variable has a significant impact on the dependent variable.

You will look at the P-value of the Chi-square test to make a decision on acceptance or rejection of the hypothesis. If the P- value of the Chi-square test is less than 5 percent, you reject the null hypothesis; you reject the null hypothesis that the variable has no significant impact on the dependent variable. This means that the variable has some significant impact; hence, you keep it in the model. If the P-value of the Chi-square test is greater than 5 percent, then there is not enough evidence to reject the null hypothesis. So, you accept the null hypothesis that the variable has no significant impact on the dependent variable; hence, you drop it from the model. Dropping such insignificant variables from the model will have no influence on model accuracy.

You can take a look at the Wald Chi-square value when you are comparing two independent variables to decide which variable has a greater impact. If the Wald Chi-square value is high, then the P-value is low. For example, if you are comparing the impact of two independent variables such as income and number of dependents on the response variable, then the variable with the higher Wald Chi-square value will be chosen because it has a higher impact on the dependent variable (or response variable).

Summary of Logistic Regression

Applicability: Look at the data and the dependent variables. Is it categorical? Yes/No, 0/1, Win/Loss, and so on, are the types of response variable outcomes where you can apply logistic regression.
Chi-square value: Look at the overall Chi-square value to decide whether a model is significant. If the Chi-square test fails, then stop it right there; the overall model itself is not significant. Chi-square is not going to tell you anything about the precise accuracy of a model.
VIF: The multicollinearity issue needs to be solved in same fashion as in linear regression models. Identify and resolve it by dropping the troublesome variables.
Overall accuracy/concordance: Determine the accuracy of a mode by looking at its concordance and C values. The higher the value, the better the model. If the values of concordance and C are not satisfactory, then you may think of collecting some more data or adding better, more impactful independent variables, which will improve the overall model performance.
Individual impact/Wald Chi-square value: Look at the individual impact of each variable by looking at the Wald Chi-square value. Drop the insignificant variables