Classification and Regression: Are They the Same?

Jayesh
The Startup
4 min readAug 6, 2020

--

You might have come across the terms Regression and Classification, and might as well think they mean one and the same thing. But this is not true.

Machine Learning is subdivided largely into supervised and unsupervised. Supervised Learning is further divided into Regression and Classification. Regression deals with predicting the value of a test case given,after learning from the training set taken, whose value is continuous and Classification means categorizing data into a binary test case, say Y/N case or True/False case, learning from the training set. Thus, in classification we play with probabilities and thus predict the outcome.

For example predicting the height, weight or salary of people fall in the category of Regression. Many regression models can be used to predict these attributes. Classification on the other hand is all about whether or not an action will be performed. For example, Will the people buy a particular car or house given their salary and age, tossing of a fair coin given number of trials, will the investors invest in a specific share given their past interests; all can be classified as Yes or No problems and are examples of classification problems

In this article I will be touching upon Logistic Regression and how it is used to classify in a problem

Logistic Regression

You must be familiar with Linear Regression given by the following formula

y = b0 + b1*x

Let us consider that y- axis is given by whether a person buys a car or not and x-axis is his age. You will observe that if a person is below a threshold age, he/she never buys the car. Similarly if he is above a certain age he/she always buys the car. Hence, there is a need to remove the lines intersecting the x axis and y=1 line, making it horizontal in these regions. This is where Logistic regression steps in.

Logistic Regression for a linear model is given by the formula:

ln(P/1-P) = b0 + b1* x

where P is the probability of the case considered.

Hence the curve now considers a probability of 1 above a point and probability of 0 below the threshold value. This leaves us with the mid-region denoted by a confusion matrix.

You might have come across targeted advertising on social media which often leaves you wondering if social media has been stalking you everywhere or not! This targeted advertising is also done through various Machine Learning algorithms.

For example, a car company needs to find out whether or not the population will buy an expensive luxury car, given the population age and estimated salary. Let us take the case of linear logistic regression. This can be done by dividing the data collected into training and test set as follows:

In the training set the red dots represent that the population will not buy the car and green dots represent that the population will. The logistic regression algorithm will learn from the data and linearly divide the data into two categories, here they are red (will not buy) and green (will buy). Thus the algorithm decides the best fit and applies it to the test set.

We can observe that the logistic regression classification model almost successfully predicts whether the population will buy the luxury car or not, given their age and estimated salary.

The outcomes and shortcomings of this model can be addressed in other regression models. This is the basic intuition about Regression and Classification.

Written By:

Jayesh Kumar

3rd Year, ECE

MIT Manipal, India

Originally published at https://www.linkedin.com on February 27, 2019

--

--