Logistic Regression with StandardScaler-From the Scratch

Dharmaraj
4 min readMay 17, 2022

--

Introduction

Hi everyone, Today we are going to see Logistic Regression from the scratch. In Machine Learning techniques Logistic Regression has separate strong importance. Logistic Regression is used to solve classification problems. The classification algorithm Logistic Regression is used when the dependent variable(target) is categorical. The dependant variable in logistic regression is a binary variable with data coded as 1 (yes, happy, True, normal, success, etc.) or 0 (no, Sad, False, abnormal, failure, etc.). In this blog, we are going to explore logistic regression to predict whether a user can buy a car or not.

Theory

We know the logistic regression algorithm used to solve classification problems. Instead of fitting a straight line or hyperplane, the logistic regression model uses the logistic function to squeeze the output of a linear equation between 0 and 1. The logistic function is defined as:

Implementation

Importing the libraries

import matplotlib.pyplot as plt
import pandas as pd
from sklearn.metrics import accuracy_score
import seaborn as sns
from sklearn.metrics import confusion_matrix

Importing the dataset

dataset = pd.read_csv(‘data.csv’)
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values

Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 10)

Feature Scaling
StandardScaler performs the task of Standardization. Our dataset contains variable values that are different in scale. For e.g. age 20–70 and SALARY column with values on a scale of 100000–800000. As these two columns are different in scale, they are Standardized to have a common scale while building a machine learning model.

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Fitting Logistic Regression to the Training set

from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 10)
classifier.fit(X_train, y_train)

Predict and get Accuray for the Test data

y_pred = classifier.predict(X_test)
test_acc = accuracy_score(y_test, y_pred)
print(“The Accuracy for Test Set is {}”.format(test_acc*100))

Making the Confusion Matrix

cm=confusion_matrix(y_test,y_pred)
plt.figure(figsize=(12,6))
plt.title(“Confusion Matrix”)
sns.heatmap(cm, annot=True,fmt=’d’, cmap=’Blues’)
plt.ylabel(“Actual Values”)
plt.xlabel(“Predicted Values”)

Creating a classification report for the model

from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))

Prediction with new data

For the new input, we applied transform because we trained our model with standard scaling. If we train our model without scaling we don't need to use transform in new input data.

user_age_salary=[[20,900000]]
scaled_result = sc.transform(user_age_salary)
res=classifier.predict(scaled_result)
if res==1:
print("He can buy the car")
else:
print("He can't buy the car")
##Skip this part if you dont need to use this model in other file##
import dump
dump(classifier, open('model.pkl', 'wb'))
# save the scaler
dump(sc, open('scaler.pkl', 'wb'))

app.py

If you want to use your model in a different file or deployment kindly refer to the below code.

from pickle import load
# load the model
model = load(open(‘model.pkl’, ‘rb’))
# load the scaler
scaler = load(open(‘scaler.pkl’, ‘rb’))
#predict user age 20 and salary 900000
user_age_salary=[[20,900000]]
scaled_result = scaler.transform(user_age_salary)
res=model.predict(scaled_result)
if res==1:
print(“He can buy the car”)
else:
print(“He can’t buy the car”)

Find the full source code here

Conclusion

Logistic Regression is the most efficient algorithm when the different outcomes or distinctions represented by the data are linearly separable. The main advantage of logistic regression is that it is much easier to set up and train than other machine learning algorithms. The most negative part of logistic regression is difficult to capture complex relationships. More powerful and complex algorithms such as Neural Networks can easily outperform this algorithm.

Have doubts? Need help? Contact me!

LinkedIn: https://www.linkedin.com/in/dharmaraj-d-1b707898

Github: https://github.com/DharmarajPi

--

--

Dharmaraj

I have worked on projects that involved Machine Learning, Deep Learning, Computer Vision, and AWS. https://www.linkedin.com/in/dharmaraj-d-1b707898/