Logistic Regression

Oluwakemi Ogebule
3 min readOct 14, 2022

--

Logistic regression is a supervised machine learning model which is used when the dependent variable i.e. the target is categorical. It predicts if something is true or false. It also helps to discriminate or categorize objects into classes (a). It fits an S-shaped logistic function. The curve goes from 0 to 1, 0 when false and 1 when true. For example, it can predict if customer is satisfied — 1 or not — 0.

Sigmoid Curve (S-shaped) of Logistic Function by Preethi ©.
Figure from https://dataaspirant.com/how-logistic-regression-model-works/

The figure above demonstrates how logistic regression works. It takes in independent variables as input (features) and applies a logistic regression classifier / logistic or logit function / Softmax function to predict the target class / dependent variable, which in this case is either happy or sad (b).

The logistic or logit function

y=σ(x)= 1/(1+e^(-x) )

Where y is the dependent, target variable; σ(x) is a sigmoid function of x where x is the independent variable / predictor / feature. The equation above represents the logistic function.

Types of logistic regression

Binary Logistic Regression: is used for classification problems with only two likely outcomes such as yes or no, 0 or 1. The logistic function may obtain probabilities that range between 0 and 1, but the final output is approximated to 0 or 1 by the regression model ©.

Softmax or Multinomial Logistic Regression: is applicable to problems that may have multiple but finite possible outcomes e.g. it can predict if food cost will increase by 25%, 50%, 75% or 100% but cannot predict the exact values ©.

Ordinal Logistic Regression: or ordered logit model is a type of multinomial regression that is best used for ranking. An example is customer service ranking if service provided is excellent, good, fair or poor ©.

Use cases of logistic regression

Logistic regression is best suited to classification and prediction problems. It used for but not limited to disease prediction, churn prediction, fraud detection.

Error and Error Handling in Logistic Regression

Cost Function: Error is called cost function. Cost function estimates the modelling error when training the input data and it is the training cost over all training instances and is represented by Log loss. The goal is for error or cost function to be minimized.

Gradient Descent: is an optimization algorithm that iteratively tweaks parameters in order to minimize the cost function. See image below.

Gradient Descent by Preethi ©.

Bias-Variance Tradeoff

This is a reduction of variance by increasing the bias. Logistic regression is a high bias and low variance model.

Advantages and Limitations of Logistic Regression

It is immune to overfitting and can be applied to multiclass problems. On the other hand, it may create linear regression boundary.

Conclusion

This blog gives you a brief introduction to logistic regression, the types of logistic regression, sample use cases of the model, its error handling, its advantages and its limitations.

References:

a https://www.ibm.com/topics/logistic-regression

b https://dataaspirant.com/how-logistic-regression-model-works/

c https://www.techtarget.com/searchcustomerexperience/definition/churn-rate

d https://medium.com/@tpreethi19/what-is-logistic-regression-4251709634bb

--

--