Logistic Regression

Rishi Kumar
Nerd For Tech
Published in
3 min readJun 5, 2021
Figure 1: Classification Algorithm

Linear Regression works for continuous variable prediction. EX: Prediction of percentage mark scored by the student.

Logistic Regression works for discrete variable prediction. EX: Predicting Pass or Fail of the Student.

Figure 2: Classification vs Regression

Logistic Regression is used when the dependent variable(target) is categorical.

For Example:

→ Predicting Email is spam(0) or ham(1).

→ Predicting online transaction fraud(0) or not(1).

→ Disease Diagnosis no(0) or yes(1).

Consider a scenario where we need to classify whether an email is spam or not. If we use linear regression for this problem, there is a need for setting up a threshold based on which classification can be done. Say if the actual class is malignant, predicted continuous value 0.4 and the threshold value is 0.5, the data point will be classified as not malignant which can lead to serious consequence in real time.

From this example, it can be inferred that linear regression is not suitable for classification problem. Linear regression is unbounded, and this brings logistic regression into picture. Their value strictly ranges from 0 to 1.

You may be thinking why the name regression is in logistic regression since it is used in classification problem. Linear Regression concept is the base for logistic regression, it forms sigmoid function based on linear regression formula where it sets threshold value for classification task.

Deriving Sigmoid function:

Figure 3: Logistic Regression Derivation part 1
Figure 4 : Logistic Regression Derivation part 2
Figure 5 : Logistic Regression Derivation part 3
Figure 6: Linear Regression vs Logistic Regression.
  • If we use linear regression’s best fit line for classification problems, we will get incorrect values. So we apply sigmoid function to predict discrete values.
  • In sigmoid function, it has threshold value. For example: Assign threshold value as 0.5. If the final value is greater than 0.5, it will be classified as 1. If the value is lower than 0.5, it will be classified as 0.

For example: Refer this GitHub link for solved example for Logistic Regression.

https://github.com/Rishikumar04/Data-Science-Training/blob/main/Classification%20Problems/02-Logistic%20Regression%20Project.ipynb

--

--

Rishi Kumar
Nerd For Tech

I'm a passionate and disciplined Data Science enthusiast working with Logitech as Data Scientist