Understanding Logistic Regression in Machine Learning
Hi folks, welcome back to this edition of blog, thank you so much for your love and support, I hope you all are doing well. Today we will learn and explore about Logistic Regression that we use in Machine Learning.

This blog will cover topics in and around, what is logistic regression and why do we use it with an example, steps involved in it with understanding the variables, likelihood, Probability Vs Odds, Log odds ratio and finally understanding confusion matrix in simple way. So guys with all your enthusiasm let us gear up and get started.

What is logistic regression? It is a statistical classification model, which deals with categorical dependent variables (temperature, area, gender). It could be binary (yes or no, true or false) or could be multinomial. It can take both continuous and discrete input data.

Why do we use Logistic regression?
It is used as a tool for statistical and discrete data analysis, because it gives output in terms of probability that helps in classifying the given data.
Example of spam email classifier:

Let us look at the approach in building the spam email classifier
Steps:
Understanding the variables.
Plot the labelled data.
Draw regression curve.
Find out the best-fitted curve using Maximum Likelihood Estimator (MLE).
Moving ahead let us understand about the variables. We will try to understand the independent Variable, which in our case is Count of spam words. We will try to learn this by listing down some commonly used words that are used in emails like Buy , get paid , Guarantee , Winner, Unlimited etc. If we have such kind of words in a mail then we should consider it as a spam mail. So we will be making a bag of such spam words.
Our Dependent Variable will be the probability of mail being spam, which is binary.

We can now estimate that out of 10 words in a mail if we get 5 words that is present in the bag of spam words then we label it as spam i.e. 1. There can be case where in we get a mail with such words that are present in the bag of spam words but those are not spam in reality, so to overcome such cases we need to build classifier.
Let us created a pre-labelled data set given below just to understand the plotting manually.

The point is how do we find the best fit for this? So we need to find the regression curve which would be the best fit and that would be our logistic regression curve.
How do we do this? It is done in 3 steps:
We convert y-axis from the scale of probability defined between 0 & 1 to scale of Log Odds. Then by drawing random regression line out of the data we already have between 0 & 1, using Sigmoid function we will convert the Log odds to the probability of mail being spam.

Then we plot those probability and we will get the regression curve, this plot will find out the log likelihood values of each mail (Individual likelihood). Lastly after this we will find the Log likelihood of the regression curve.

What does the Log (odds) mean?
But before this let us understand the difference between Probability Vs Odds by an example: Suppose a guy went to catch fish 5 times a week, in which he caught a fish 2 times and he failed for 3 times.
Then what is the probability and odds of getting a Fish for dinner? We calculate this as given in the image below.

Log(odds) is also called as Logit Function, but odds is not equal to odds ratio. With this we have a look on the log odds ratio which is, let’s say the Odds of catching on sunny day is 2/3 and odds of catching on rainy day is 3/2.



Moving ahead now we say what is sigmoid function? It is the standard logistic function and resembles as shaped curve. Sigmoid curve has a finite limit of :
‘0’ as x approaches (–∞)
‘1’ as x approaches (+∞)
The best part of sigmoid function is that it takes any real valued number and maps into values between 0 and 1, which in turn is helpful while solving classification problems.
Now after understanding sigmoid function let us find out the best MLE.
For this the formula is

Now how can we check that this is the best-fitted curve, to find this MLE comes in the picture. We calculate the Likelihood of individual mail i.e.

After getting the likelihood of individual mail we just need to multiply it by the likelihood of entire curve. i.e.
Likelihood of data of entire curve = [(1- 0.01)* (1- 0.01) * (1- 0.03) *(1- 0.06) * (1- 0.05) * (1- 0.97) *(1- 0.99) * 1- 0.99)]
And then take the log of the likelihood as log(1- 0.01)* log (1- 0.01) * log (1- 0.03) *log (1- 0.06) * log (1- 0.05) * log (1- 0.97) *log (1- 0.99) * log (1- 0.99).
Now we just rotate the line and then calculate it again by repeating the above process.
So for example we have 2 lines A & B. A with likelihood of -0.084 and B with likelihood of -0.207. Therefore comparing them we can say that line A has a better likelihood values the line B. we can keep rotating the line until we get the maximum log likelihood and at the last that would be the best fitted regression line. I hope you would have understood how logistic regression works. Finally lets us understand confusion matrix in short.
What is Confusion matrix?
It defines or shows the way in which the classification model is confused when it makes prediction over your data. It is summary of prediction result on a classification problem broken down by classes i.e. Actual & Predicted.
In short it summarizes the count value of correct and incorrect predictions.

Interpretation of Confusion Matrix:

True Positive: Alarm goes on in case of fire
False Positive: Alarm goes on but no fire
False Negative: No Alarm in case of fire
True Negative: No Alarm no fire.
I am attaching a sample demonstration of confusion matrix using python in a gist.
Here the results comes out to be
[[4 2] [1 3]]
So we can say that,
4 = 0 predicted as 0 is 4 times.
2 = 0 predicted as 1 is 2 times.
1 = 1 predicted as 0 is 1 times.
3 = 1 predicted as 1 is 3 times.
So correct predictions made is 4+3 = 7
i.e. (0 predicted as 0) + (1 predicted as 1)
And incorrect predictions made is 2+1 = 3
i.e. (0 predicted as 1) + (1 predicted as 0).
So with above analysis we can conclude that the, Accuracy is 70% as it predicted correct 7 times and predicts incorrect 3 times.
I hope the above collection of stuff is knowledgeable and would have given you a glance about the topic and on this note, I would like to sign off for today. I would love to know if you wish me to cover any topic related to data science , Machine learning etc, then please do leave your comments in the comment section on my blogs so that i can make note of those blogs and write is for everyone’s learning.
Do follow me to get updates regarding all my blogs on Medium & LinkedIn. If you really like the above stuffs then do comment below because learning has no limits .
Stay Happy, Stay Fit, Stay Humble…!

