Logistic Regression | The Linear Regression Intuition | Part 1

Published in

Geek Culture

4 min readJun 17, 2021

In the Linear Regression blog, I talked about entering the world of Machine Learning. After understanding Linear Regression, a Regression problem, we usually move forward to Logistic Regression. In this blog, we’ll learn everything about Linear Regression intuition on classification problems. If you haven’t read the Linear Regression blog yet, here’s the link:

Beginners Guide to Machine Learning | Linear Regression

When we enter the world of Machine Learning, most of us start with what it is and then start with Linear Regression…

kumawatrohan.medium.com

Introduction

Classification aims to determine which category an observation belongs to, and this is done by understanding the relationship between the dependent variable and the independent variables. Here the dependent variable is categorical, while the independent variables can be numerical or categorical.

Logistic Regression is a classification problem. The idea of Logistic Regression is to find a relationship between features and the probability of a particular outcome. It is used when our target variable is categorical. Example: Classifying between a cat and a dog, a student fails or passes, a person has diabetes or not, etc.

Logistic Regression is also of two types:

One question that should arise in everyone’s mind is that if it’s a classification algorithm, why is Logistic Regression and not Logistic Classification?

Why Logistic Regression?

Let’s understand this with the help of an example:

This graph says that on an X-Y axis where we have plotted Weight on X-axis, Y-axis represents that either a person can be obese or not obese. In layman terms, we classify a person as obese or not obese based on the person’s weight. If someone has more than 80 kgs, he/she is obese, and if it is less than 80 kgs, he/she is not obese. We should come up with something which can directly give us the result (obese/not obese).

Can we solve this problem with Linear Regression?

In Linear Regression, we’ll try to find a straight line such that the summation of the distance between the line and the points is minimum. We can put a condition that if our ‘y’ value is over or equal to a specific value, then we’ll consider the person as obese. We can solve this binary classification problem with the use of Linear Regression.

If yes, then what’s the problem with Linear Regression?

Let’s consider that we get an outlier, and our best fit line will change according to that. According to this new best fit line, if someone has more than 100 kgs, then a person is overweight. This result is not the one we want!

Using Linear Regression, we’re computing everything using distance. It introduces a high error rate. If our value gets over one or is less than zero, what should we consider for this situation?

Summary

We don’t use Linear Regression for Classification because it deals with continuous values, whereas classification problems mandate discrete values. A linear model does not output probabilities, but it treats the classes as numbers (0 and 1). It fits the best hyperplane (for a single feature, a line) that minimizes the distances between the points hyperplane. It also gives you values below zero and above one. Since the predicted outcome is not a probability but a linear interpolation between points, there is no meaningful threshold for distinguishing one class from the other. Linear models do not extend to classification problems with multiple classes. You would have to start labelling the next class with 2, then 3, and so on. The classes might not have any meaningful order, but the linear model would force a weird structure on the relationship between the features and your class predictions. The higher the value of a feature with a positive weight, the more it contributes to predicting a class with a higher number, even if classes that happen to get similar are not closer than other classes.

This blog was all about can we use Linear Regression for Logistic Regression or not, and we found out that technically we can perform this algorithm, but it has a higher error rate. So, we don’t use Linear Regression for Classification problems as such. In part 2 of the Logistic Regression blog, we’ll understand the intuition behind Logistic Regression and how it is better than Linear Regression to build our classification model.