Logistic Regression

ROHITH RAMESH
Analytics Vidhya
Published in
5 min readSep 30, 2019

In spite being called a Regression measure, is used for classification. Logistic Regression is one of the basic and popular algorithms. It is a form of Regression analysis that uses a mixture of continuous and discrete predictors to predict discrete variables.

What is a Classification Problem?

We identify problem as classification problem when independent variables are continuous in nature and dependent variable is in categorical form i.e. in classes like positive class and negative class. All these problem’s answers are in categorical form i.e. Yes or No.

Although, sometimes we come across more than 2 classes and still it is a classification problem. These types of problems are known as multi class classification problems.

Why not use Linear Regression?

Suppose we have a data of tumor size vs its malignancy. As it is a classification problem, if we plot, we can see, all the values will lie on 0 and 1. And if we fit best found regression line, by assuming the threshold at 0.5, the line does a reasonable job.

We can decide the point on the x axis from where all the values lie to its left side are considered as negative class and all the values lie to its right side are positive class.

But what if there is an outlier in the data. Things would get messy. For example, for 0.5 threshold,

If we fit best found regression line, it still won’t be enough to decide any point by which we can differentiate classes. It will put some positive class examples into negative class. The green dotted line (Decision Boundary) is dividing malignant tumors from benign tumors but the line should have been at a yellow line which is clearly dividing the positive and negative examples. So just a single outlier is disturbing the whole linear regression predictions. And that is where logistic regression comes into picture.

Problem:

1. When we use linear regression models, the predicted values are unbounded -∞ to ∞.

2. But the probability values are restricted between 0 and 1.

Solution: To Run Logit Transformation.

What is Logit Transformation and why do we need it?

Logit is a transformation, which we use to transform our model to make a linear model. The logit transformation transforms a line to a logistic curve. Logistic regression fits a logistic curve to set of data where the dependent variable can only take the values 0 and 1.

It takes the log of ‘Odds Ratio’. Odds Ratio is a statistical term that denotes probability of success to probability of failure.

Odds ratio = P/(1-P)

Example : If probability of success = 0.75 , then

probability of failure = 1–0.75=0.25

Odds ratio = (0.75/0.25)=3

What will be the value of P and 1-P be?

P/(1-P) — can take values of 0 to ∞. Because remember ‘Probability of Y’ can only take values of 0 to 1 only. So,

if P=0, P/(1-P)=(0/1)=0

if P=1 , P/(1-P)=(1/0)= ∞

Instead of ‘Probability of Y’, we take the odds ratio of ‘Probability of success’ which is ‘P/(1-P)’. Now, odds ratio can take a value of 0 to ∞.

But in Linear Regression model, the predicted values can fall anywhere between -∞ to ∞. And thus we take the log(P/(1-P)).

Log(0) = -∞

Log(1) = ∞

Calculation :

Logistic Regression Algorithm

As discussed earlier, to deal with outliers, Logistic Regression uses Sigmoid function. The logistic function is a Sigmoid function, which takes any real value between zero and one. It is defined as

And if we plot it, the graph will be S curve,

Let’s consider t as linear function in a univariate regression model.

So, the Logistic Equation will become

Now, when logistic regression model comes across an outlier, it will take care of it.

But sometimes it will shift its y axis to left or right depending on outliers positions.

Estimation for Logistic Regression Model:

Logistic equation is estimated using a technique known as ‘Maximum Likelihood Estimation (MLE).

When the underlying distribution of the error terms is normal, MLE estimates are similar to OLS(Ordinary Least Square) estimates.

OLS like many other is a special case of MLE.

Y = mx + c

MLE intuition:

The coefficients values (m & c) are found such that they maximize the likelihood that the process described by the model produced the data that were observed.

“What value of the unknown parameters makes the data we see least surprising?”

Advantages:

It is a widely used technique because it is very efficient, does not require too many computational resources, it’s highly interpretable, it doesn’t require input features to be scaled, it doesn’t require any tuning, it’s easy to regularize, and it outputs well-calibrated predicted probabilities.

Like linear regression, logistic regression does work better when you remove attributes that are unrelated to the output variable as well as attributes that are very similar (correlated) to each other. Therefore, Feature Engineering plays an important role with regards to the performance of Logistic Regression. Another advantage of Logistic Regression is that it is incredibly easy to implement and very efficient to train.

Disadvantages:

  • Identifying Independent Variables
  • Overfitting the model
  • Limited to Linear relationships between variables
  • Sensitivity to Outliers
  • Large Sample Size

Python Implementation :

Thank you for reading.

--

--

ROHITH RAMESH
Analytics Vidhya

Keep Developing Your Skills and Encourage Data-Driven Culture