Build a Logistic Regression From Scratch with Python

3 min readOct 26, 2021

This article is to illustrate how to build a logistic regression module from scratch.

Today, our Data Scientist work is supported by already-written packages extremely well. However, understand how it works under the hood and be able to modify it is sometimes needed as a proficient Data Scientist. The reasons are:

If you don’t understand how it was created, you can’t customize it
The more you understand how the code was created, the better for you to explain it to other team members as well as stakeholders.

Recall the 3 primary cores of Logistic Regression model:

Algorithm: Sigmoid function combine with a linear function z

2. Loss function: Cross Entropy

3. Optimizer: Gradient Descent

Gradient Descent

Dataset

Let’s use the Iris dataset which contains 3 classes of 50 instances each, where each class refer to a type of iris plant. One class is linearly separable from the other two, the latter are not linearly separable from each other.

For the sake of simplicity, I will select out only :

The first 2 features: sepal length and sepal width
the 2 classes which are linearly separable to perform a binary classification problem using Logistic Regression.

Hypothesis

We could find a hyperplane that linearly separate between the 2 classes.

Intuitively, this is possible by looking at the graph below.

Scatter plot by 2 classes 0 and 1 with only 2 features

Coding 3 primary cores

Sigmoid combined with a linear function as the model’s algorithm

Sigmoid function

2. Cross Entropy as the Loss function

Cross Entropy Function

3. Gradient Descent as the Optimizer

To find out the optimal weight, we will get the weight of the model’s algorithm that minimize the loss function using gradient descent.

After getting the optimal weight W, we will do prediction by calculate the value of sigmoid function and compare it with the chosen threshold.

Combine everything together into a Logistic Regression class.

Let’s try to fit the data with the Logistic Regression model above.

1st Result from logistic regression model with default hyper parameter

Given the default hyperparameters, the accuracy of the 1st model is about 99.3%. Not bad. Let’s visualize the decision boundary!

Decision boundary of 1st Logistic Regression Model

However, I’m still unsatisfied about the 1st model’s result. What if we change the hyperparameters with higher learning rate and iteration amounts.

Accuracy is 1. Perfect! It’s seem the 1st model was overfitting. Let’s visualize the decision boundary one more time.

Decision boundary of 2nd Logistic Regression Model

We finished building a logistic regression model from scratch! I believe this is not difficult and you can definitively try it on your own at home. Check my detail code at Github link.

Happy learning!

Build a Logistic Regression From Scratch with Python

Dataset

Written by hqtquynhtram