Why we need Logistic Regression?

Published in

Analytics Vidhya

4 min readApr 17, 2021

Why we need logistic regression?
It can be asked by interviewer for sure, if we can use linear regression then why logistic regression so sometimes in our data we have outliers so in linear regression we have to make best fit line based on the data point that will incorrectly predict the output values so here in this case our linear regression fail,

So two reasons why should linear regression not be used for the binary classification :

Whenever I have a lot of outliers then our best fit line can completely deviate.
Whatever output I am getting, most of the time I am getting greater than 1 and less than 0 so to solve this problem we have to use logistic regression.

So in regression, we are predicting the continuous values but what if we want to predict categorical values like True or False , right or wrong, Yes or no so in that case our linear regression model doesn’t work, so for solving this type of problem we have to use logistic regression. In logistic regression, we are playing with the probability(Chances of our output variable).

Logistic regression is a supervised learning classification algorithm used to predict the probability of the target variable. By logistic we are trying to solve the binary classification(True or false )and Multiclass classification(Class 1, class2, class3 and so on)

How it works?

As our goal is to find yes or no by using some independent variables so you can imagine there can be only two cases 0 and 1 by which we can say if I get 0 then no or Failure if I get 1 then yes or success, in this algorithm we have set a bar and that bar will help us to categorize this yes or no variables.

As shown in the graph we have set a threshold value of 0.5, if the value is greater than 0.5 it’s a success and if less than 0.5 it’s a failure. So in logistic regression, our value is always between 0 and 1. So you must thinking that if the data point value is preciously positioned at threshold 0.5 then in this case that data points are unclassifiable and that is a very rare case.

Another point to be noted if our data point lies above the line then it is considered as +ve and if it's below the slope then it is considered as -ve.

The main aim of logistic regression is to find out the max of the cost function.

The cost function is derived from Y=wx+c (equation of line).If you want to learn more about this equation please read out my prior article https://medium.com/@Monikarajput./facts-behind-linear-regression-42b100fa2cd3

and this cost function value based on the updation of weight “w” which gives the maximum value that will be used to create the best fit line.

and in the next step, we will use the sigmoid function. Now, what is a sigmoid function?

The sigmoid function will transform all your summation between 0 to 1 by doing this it is eliminating the effect of outliers and that is where our sigmoid function used. how vastly high value may be having let it be any number of outliers you have it will directly transform those values into small values that are range between 0 to 1

So the main funda is doing this multiplication y*wx and applying this particular activation function with updating the “m”, unless or until you get the best fit line that can classify the points and put this value in the sigmoid function.

Two Types of Logistic regression

Binary Logistic Regression
Multiclass Logistic Regression

Binary Logistic Regression

The simplest form of logistic regression is binary or binomial logistic regression in which the target or dependent variable can have only 2 possible types either 1 or 0.

Multiclass Logistic regression

Suppose we have three classes, type A, type B, type C so here what logistic regression is doing is to split the multi-class classification problem into multiple binary classification problems and fit a standard logistic regression model on each sub-problem. we called this technique one vs rest.

As shown in the figure we have three classes so after applying this approach we have 3 models now when we feed our test data then every model gives the output then we select the model which has the highest probability we will consider that as our output.