Logistic Regression

4 min readOct 3, 2018

In the previous article, I discussed the Linear Regression algorithm and how it deals with regression problems by assuming there exists a linear relationship between the input variables and the target output variable.

Logistic Regression is another Supervised Learning algorithm, and it works in a similar manner as Linear Regression, except that Logistic Regression is a classification algorithm! So don’t let the name confuse you.

Logistic Regression is used when the target output variable is categorical, and in most cases when it is binary — has only two classes. For example, you’d use Logistic Regression in labeling an email as ‘spam’ or ‘not spam’, or predicting whether a patient has lung cancer or not. Hence, this type of Logistic Regression is called Binary Logistic Regression.

Another important thing to note is that the output value of Logistic Regression is actually a probability that an input data point belongs to a particular class.

How does it work?

The point of Logistic Regression is to have a binary output — 0 or 1. The algorithm aims to separate the data points by a line into two classes, which comes from Linear Regression, and so to do this it must link the Linear Regression equation ( y = a + bx) with the logistic function, to obtain the probability. The logistic function is also known as the sigmoid function, which produces an S shaped curve with values between 0 and 1.

The logistic (sigmoid) function:

Where

p is the probability,
and y is the output from the Linear Regression equation.

This function is what squashes the Linear Regression output into a value between 0 and 1. Let us assume there is only one input feature, so the function will be:

Where

P is the probability,
a is the y-intercept,
b is the modeling coefficient,
and x is the input feature.

As was the case in Linear Regression, there is a cost function for Logistic Regression too, for which a global minimum must be found by the gradient descent algorithm. However this cost function is a logarithmic function since the values need to be between 0 and 1 only:

For an explanation of the math please click here.

The gradient descent algorithm is used here just like it is used in Linear Regression. So once again the same cycle is repeated —

Starting values are assigned to a and b.
The cost function J(a, b) is calculated and plotted.
The gradient of the cost function is calculated.
Minor changes are made to the values of a and b.
The gradient of the cost function is calculated. If the gradient is smaller than before, then steps 2, 3 and 4 are repeated until the gradient doesn’t change, which means the graph has converged and the minimum has been reached.

Once the final coefficients have been found, they are used in the logistic function to predict the output y. This output value is the probability of that row of data belonging to a particular class, and is between the range of 0 and 1.

So how does the algorithm determine which class to assign?

The algorithm uses a decision boundary, which is a threshold value, to decide which class to assign to a particular data point. For example, if it is to determine whether an email is ‘spam’ or ‘not spam’, and the decision boundary is set to y>=0.5 for the ‘spam label’, then a data point (email) corresponding to a y value of 0.8 will be classified as ‘spam’, while a data point with a y value of 0.2 will be classified as ‘not spam’.

Binary Logistic Regression is the most commonly used form of Logistic Regression. However, there are two other types —

Multinomial Logistic Regression — when there are three or more categorical unordered outputs; for example, predicting the flavor of ice cream (chocolate, vanilla, strawberry)
Ordinal Logistic Regression — when there are three or more categorical ordered outputs; for example, predicting a movie rating from 1 to 5.

To read more about Logistic Regression, click here.

To see how Logistic Regression is coded for in Python using Scikit-Learn, click here!

Next, I’ll be discussing the K Nearest Neighbors algorithm, so stay tuned!

Logistic Regression

How does it work?

So how does the algorithm determine which class to assign?

Written by Shubhangi Hora