The math behind Logistic Regression

Khushwant Rai
Analytics Vidhya
Published in
4 min readJun 14, 2020

In my last four blogs, I talked about Linear regression, Cost Function, Gradient descent, and some of the ways to assess the performance of Linear Models. Now, in this blog, we will start learning about the classification models and the first one of them is Logistic Regression.

What is Logistic Regression?

A statistical model typically used to model a binary dependent variable with the help of logistic function. Another name for the logistic function is a sigmoid function and is given by:

This function assists the logistic regression model to squeeze the values from (-k,k) to (0,1). Logistic regression is majorly used for binary classification tasks; however, it can be used for multiclass classification

why are we calling a classification model the Logistic ‘Regression’?

The reason behind this is that just like Linear Regression, logistic regression starts from a linear equation. However, this equation consists of log-odds which is further passed through a sigmoid function which squeezes the output of the linear equation to a probability between 0 and 1. And, we can decide a decision boundary and use this probability to conduct classification task. For example, let’s assume we are predicting whether it is going to rain tomorrow or not based on the given dataset, and if after applying the logistic model, probability comes out to be 90% then we can surely say that it is highly possible to rain tomorrow. On the other hand, if probability comes out to be 10%, we may say that it is not going to rain tomorrow, and this is how we can transform probabilities to binary.

Maths behind Logistic Regression

We could start by assuming p(x) be the linear function. However, the problem is that p is the probability that should vary from 0 to 1 whereas p(x) is an unbounded linear equation. To address this problem, let us assume, log p(x) be a linear function of x and further, to bound it between a range of (0,1), we will use logit transformation. Therefore, we will consider log p(x)/(1-p(x)). Next, we will make this function to be linear:

After solving for p(x):

To make the logistic regression a linear classifier, we could choose a certain threshold, e.g. 0.5. Now, the misclassification rate can be minimized if we predict y=1 when p ≥ 0.5 and y=0 when p<0.5. Here, 1 and 0 are the classes.

Since Logistic regression predicts probabilities, we can fit it using likelihood. Therefore, for each training data point x, the predicted class is y. Probability of y is either p if y=1 or 1-p if y=0. Now, the likelihood can be written as:

The multiplication can be transformed into a sum by taking the log:

Further, after putting the value of p(x):

The next step is to take a maximum of the above likelihood function because in the case of logistic regression gradient ascent is implemented (opposite of gradient descent).

Maximum Likelihood Estimation (MLE)

A method of estimating the parameters of probability distribution by maximizing a likelihood function, in order to increase the probability of occurring the observed data. We can find MLE by differentiating the above equation with respect to different parameters and setting it to be zero. For example, the derivative with respect to one of the component of parameter alpha i.e. a_j is given by:

Hopefully, this post has helped you to comprehend the basic understanding of maths behind logistic regression.

References

Wood SN. Generalized additive models: an introduction with R. CRC press; 2017 May 18.

--

--

Khushwant Rai
Analytics Vidhya

MESc Student at Western University | Machine Learning & NLP Enthusiast.