source:Dreamstime.com

Data Science : Logistic Regression

Anjani Kumar
Analytics Vidhya
Published in
6 min readMay 17, 2020

--

Introduction:

Logistic Regression is supervised machine learning algorithm, used for binary class or multi class classification(also called one-vs-all Classification). This is one of the famous classification algorithm. Before understanding, lets define the meaning of regression in Logistic Regression.

Regression is basically a statistical approach to find the relationship between variables. In machine learning, this is used to predict the outcome of an event based on the relationship between variables obtained from the data-set. Linear regression is one type regression used in Machine Learning

Intuition :

Logistic regression also called logic regression or logic modeling is a supervised learning classification algorithm used to predict the probability of a target variable. The nature of target or dependent variable is dichotomous, which means there would be only two possible classes. Mathematically, a logistic regression model predicts P(Y=1) as a function of X.

Why logistic regression is called “regression” even though it is used for classification ?

Logistic regression is regression because it finds relationships between variables. It is logistic because it uses logistic function as a link function. Hence the full name.

Note: Logistic regression uses the concept of predictive modeling as regression; therefore, it is called logistic regression, but is used to classify samples; Therefore, it falls under the classification algorithm.

Why can not we solve problem using linear regression?

1. When we have lot of outliers in the data set, our best fit line completely gets deviated.Prediction given by this line will not be correct due to that we should not use linear regression for classification problem.

Equation of line

and can be also written as

where :

y=Dependent variable, x=Independent variable ,m/w=slope and c/b=Intercept

2. The output given by linear regression can have value greater then 1 or Less than 0.But as per the hypothesis of logistic regression value should be between 0 and 1 so as per hypothesis of logistic regression its not possible to have values greater than 1 or less then 0 and so we can not use linear regression.

We are going to cover logistic regression with two class problem:

Working of Logistic Regression :

Logistic regression can be applied to a problem where the two class classification problem can be linearly separable. pic below(Green class and Red class)

Positive class denoted as yi=+1

Negative class denoted as yi=-1

If intercept is 0 then y can be denoted as below.

wTx is also called distance between data points and the plane

To know more about how the weights also called coefficients(mention above )are decided and updated ,please go through my Gradient Descent Articles.

https://medium.com/datadriveninvestor/an-overview-of-gradient-descent-algorithms-e373443afa7f

If any data points above the plane,it will be a positive value as the derivative will return positive and if a point below this plane it will be negative value.

Case 1: When classifying one data point above the plane(green point in pic)

Here

So records will be currently classified

Case 2: When classifying one data point below the plane(red point in pic)

Here

(-1 * any negative value become positive value)

So records will be currently classified

Case 3: When classifying one data point which belongs to negative class(red class) but available in positive class(green class)

Here,

(-1 * any positive value become negative value)

So records will be not be currently classified

So we should note down from all these cases is that the summation of all the data points along with distances should be maximum (cost function should be maximum) to create a best fit line which can linearly separate two data points.

Now the value which we get from this cost function will be continuous value basically from -to +∞.

Now since we need to predict the class of the value we should pass this value generated by cost function to Sigmoid function which will convert cost function value between 0 and 1 (range of Sigmoid function between 0 & 1)

image:KDnuggets

This way this will remove the effect of outliers and convert all the values range from -∞ to +∞ to 0 to 1.

All the values which are below any specific threshold value(say 0.5) will be classified as class 0 and above threshold value it will be class 1 in two class classification problem.

Advantages of Logistic Regression :

  1. Logistic Regression performs well when the dataset is linearly separable.
  2. It is a widely used technique because it is very efficient, does not require too many computational resources, it’s highly interpretable, it doesn’t require input features to be scaled, it doesn’t require any tuning, it’s easy to regularize, and it outputs well-calibrated predicted probabilities.
  3. Like linear regression, logistic regression does work better when you remove attributes that are unrelated to the output variable as well as attributes that are very similar (correlated) to each other. Therefore Feature Engineering plays an important role in regards to the performance of Logistic and also Linear Regression

Disadvantages:

  1. Logistic regression attempts to predict outcomes based on a set of independent variables, but if researchers include the wrong independent variables, the model will have little to no predictive value.
  2. We can’t solve non-linear problems with logistic regression since it’s decision surface is linear.
  3. Logistic regression requires that each data point be independent of all other data points. If observations are related to one another, then the model will tend to overweight the significance of those observations

Conclusion : Logistic regression algorithm is very widely used machine learning and predictive modeling technique ,when used efficiently.

Hope you like my article.Please click on Claps icon(upto 50 times) to motivate me to write more.

Want to connect :

Linked In : https://www.linkedin.com/in/anjani-kumar-9b969a39/

If you like my posts here on Medium and would wish for me to continue doing this work, consider supporting me on patreon

--

--

Anjani Kumar
Analytics Vidhya

I am Lead Data Scientist and having interest to create blogs in Data Science,AI/ML,LLMs