Logistic Regression: Understand the math behind the algorithm

Anah Veronica
Analytics Vidhya
Published in
4 min readApr 20, 2020

--

Logistic regression is a supervised binary classification algorithm. You’re probably wondering why its called logistic regression then, it's only because it uses the concept of regression to classify. If that doesn’t make sense to you yet, don't worry, we’re going to break it all down. In logistic regression, the target value that we want to predict has the value of zero or one. It either belongs to a class, so the value is 1 or it doesn’t, in which case the value is 0 and the data distribution is binomial. The goal of the algorithm is to model the probability of an event happening based on the independent variables (features) that we provide it. It then classifies the input variables based on the probability that it belongs to a category. I am going to take you, step by step to the logistic regression function.

To understand Logistic regression in greater depth, we should understand the concept of odds and odds ratio.

Let’s get a quick recap of probability, now we know that the probability of an event is given by

The odds of an event is closely related to probability, but it’s expressed differently. The “odds” of an event is given by:

odds = p / (1 — p)

Odds gives us the ratio of how likely an event is to occur to how unlikely it is. Here’s a good example to I found on the internet that explains the concept beautifully, if a racehorse runs 100 races and wins 25 times and loses the other 75 times, the probability of winning is 25/100 = 0.25 or 25%, but the odds of the horse winning are 25/75 = 0.333 or 1 win to 3 loses.

And the odds ratio is exactly its name. The odds of two events

Okay, but how does the odds ratio relate to logistic regression?

The odds ratio of an independent variable in logistic regression represents how the odds change per unit increase that variable when all the other variables are kept constant.

Imagine if we were finding the relationship between how often a person smokes and how likely a person can get cancer. Let’s say that odds ration of smoking is 1.2, this means that a person’s chance of getting cancer increases by 1.2 for every 1 unit increase in smoking frequency.

We know that the goal of logistic regression is to model the probability of an event happening based on the input variables that we provide it. How do we map the linear combination of input variables to the domain of 0 to 1? Simple. By taking the natural log (log to the base of e) of the odds ratio also known as the logit function.

Almost there! There’s just one more thing we need to do. This function varies on the y-axis. As a standard practice, the independent variable goes in the x-axis and the dependent variables go in the y-axis. So, we want the function to vary the y-axis. We can achieve that by taking the inverse logit function. Violà! We have our logistic regression formula.

Where alpha in our case is the linear combination of our independent variables.

This function is also known as the sigmoid function. A sigmoid function is a mathematical function having a characteristic “S”-shaped curve or sigmoid curve.

Note: Logistic Regression uses something known as MLE minimum likelihood estimation to estimate the parameter of the linear combination.

Threshold:

The sigmoid function maps the inputs to values ranging between 0 to 1. To classify we need to assign it a value of either 0 or 1. Either it belongs to a category or it doesn’t. To do this we use a threshold value. A ROC curve can help figure out the value of the threshold for your project. Ideally predict function uses 0.5 as a probability threshold to identify 1 or 0. So if value > 0.5, the target is assigned 1 otherwise 0.

And that’s it. Now you know the math behind how Logistic regression works. I hope that was useful!

Resource: I used Brandon Foltz's youtube playlist on logistic regression to learn about this algorithm. I recommend his youtube channel to anyone who wants a deeper understanding of statistics and probability and how it’s used in machine learning.

--

--