Machine Learning Series: Regression-4 (Logistic Regression)

Arun
Geek Culture
Published in
3 min readAug 16, 2021

Previously, we learned all about basic linear regression and polynomial regression. Now, we are going to concentrate on logistic regression.

Logistic regression is used to model the probability of a certain class or event . For example, is a mail spam or not? Is the animal happy or sad? Is the object hard or soft?

Yet again, we are going to learn this regression method by working with an example. We will be going back to the pumpkin data set again.

Logistic Regression

Logistic regression is a process of modeling the probability of a discrete outcome given an input variable. It is a classification model rather than regression model. Logistic regression is a simple and more efficient method for binary and linear classification problems. It is a classification model, which is very easy to realize and achieves very good performance with linearly separable classes.

Learn more about logistic regression,

Types of logistic regression

The different types of logistic regression are as follows,

Binary classification — In this type of logistic regression, the target variable or the dependent variable has only two possible values.

Multinomial Logistic Regression — The target variable can take three or more values but the values do not have any definite order or preference.

Ordinal Logistic Regression — The target variable has three or more possible values and these values have an order or preference.

In previous examples we saw that higher the correlation, better the prediction. This was true for linear regression. But it is completely opposite for logistic regression. Weak correlations are much suitable for logistic regression. Also note that, logistic regression requires cleaner and more data that linear regression to perform well.

Pumpkin Data set for logistic regression

If we observe the pumpkin data set carefully, we will see that the ‘color’ of pumpkins is a binary data. It is either ‘orange’ or ‘white’. Given some variables, we are going to see what color a given pumpkin be orange or white. This type of problems are easier to solve with logistic regression.

Conclusion

We build our model and predicted the color of the pumpkin and tested it with the the test set. We observed that we had a descent amount of true negative, and it is higher than false positive and false negative. We also charted an ROC curve, which shows true positive on y-axis and false positive rate on x-axis. We then calculated our model’s AUC score which was around, 0.697 which is descent for our model.

--

--

Arun
Geek Culture

I am just a being, striving to find the purpose of it all. Alas there is none!