Logistic Regression: Machine Learning in Python

Published in

The Startup

4 min readJan 18, 2021

Finding whether or not something will happen is another dilemma we face everyday. We are faced with the question of Yes or No all the time. Researchers in the field of Machine Learning are no different.

In Machine Learning, to answer this question of probability of an event happening is solved using Logistic Regression. Although it is called “Regression”, Logistic Regression is an algorithm built to solve Classification Problems.

Math behind Logistic Regression

In statistics, the logit function or the log-odds is the logarithm of the odds p/(1-p) where p is a probability. Logit Function is a type of function that maps the probability values from [0,1] to (-∞, +∞).

In Logistic Learning however, we make use of the inverse, the sigmoidal “logistic” function. A Sigmoid function is an activation function with a characteristic “S”-shape and it is used for Classification problems in Machine Learning and AI.

Types of Logistic Regression

Binary Logistic Regression — 2 Classification Groups.
Multi-Logistic Regression — More than 2 Classification Groups.

Note: Multi-Logistic Regression is out of scope for this blog.

Hypothesis Function

The hypothesis function of Logistic Regression is just a bit different from Linear Regression Hypothesis.

Linear Regression Equation:

Sigmoid Function:

Applying the Sigmoid Function on Linear Regression Equation:

Decision Boundary

A Decision Boundary is an area in a problem space where the label of a classifier cannot be determined. In Logistic Regression, through proper fit of the decision boundary, we are able to predict which class a new feature set belongs. The intuition behind using a Sigmoid Function is that the function lies in range of 0 to 1, boundaries being asymptotic. With the decision boundary put on Sigmoid Function, we are able to differentiate between the classes.

For Example, if we have 2 classes, Class A and Class B, we have to decide a threshold value above which we classify the new feature to Class A. This threshold value is set on the Sigmoid Function. As shown in the image above, the decision boundary is set with threshold value 0.5. When the new feature is above this threshold, it gets classified as Class A and classified as Class B for values below 0.5.

Cost Function

In case of Logistic Regression, instead of using the Mean Squared Error as we did with Linear Regression, we use a Cost Function called Cross-Entropy Function. Also known as Log Loss function, the cost is determined by:

The above function can be compressed into one:

The benefits of using Cross-Entropy function as the Cost Function is that the function is monotonic (always increasing or decreasing) making it easy to calculate gradient and minimize the cost.

Vectorized Function:

Implementation from Scratch

Note: Iris Dataset is used for this example.

This article provides an insight on the Math and Reasoning behind Logistic Regression. It creates an understanding of how Logistic Regression works behind the scenes. It is however recommended to read from other sources and gain a deeper understanding of the Math and other Functions.

Thanks for reading.
Don’t forget to click on 👏!