Introduction
Hi everyone, Today we are going to see Logistic Regression from the scratch. In Machine Learning techniques Logistic Regression has separate strong importance. Logistic Regression is used to solve classification problems. The classification algorithm Logistic Regression is used when the dependent variable(target) is categorical. The dependant variable in logistic regression is a binary variable with data coded as 1 (yes, happy, True, normal, success, etc.) or 0 (no, Sad, False, abnormal, failure, etc.). In this blog, we are going to explore logistic regression to predict whether a user can buy a car or not.
Theory
We know the logistic regression algorithm used to solve classification problems. Instead of fitting a straight line or hyperplane, the logistic regression model uses the logistic function to squeeze the output of a linear equation between 0 and 1. The logistic function is defined as:
Implementation
Importing the libraries
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.metrics import accuracy_score
import seaborn as sns
from sklearn.metrics import confusion_matrix
Importing the dataset
dataset = pd.read_csv(‘data.csv’)
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 10)
Feature Scaling
StandardScaler performs the task of Standardization. Our dataset contains variable values that are different in scale. For e.g. age 20–70 and SALARY column with values on a scale of 100000–800000. As these two columns are different in scale, they are Standardized to have a common scale while building a machine learning model.
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
Fitting Logistic Regression to the Training set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 10)
classifier.fit(X_train, y_train)
Predict and get Accuray for the Test data
y_pred = classifier.predict(X_test)
test_acc = accuracy_score(y_test, y_pred)
print(“The Accuracy for Test Set is {}”.format(test_acc*100))
Making the Confusion Matrix
cm=confusion_matrix(y_test,y_pred)
plt.figure(figsize=(12,6))
plt.title(“Confusion Matrix”)
sns.heatmap(cm, annot=True,fmt=’d’, cmap=’Blues’)
plt.ylabel(“Actual Values”)
plt.xlabel(“Predicted Values”)
Creating a classification report for the model
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))
Prediction with new data
For the new input, we applied transform because we trained our model with standard scaling. If we train our model without scaling we don't need to use transform in new input data.
user_age_salary=[[20,900000]]
scaled_result = sc.transform(user_age_salary)
res=classifier.predict(scaled_result)
if res==1:
print("He can buy the car")
else:
print("He can't buy the car")##Skip this part if you dont need to use this model in other file##
import dump
dump(classifier, open('model.pkl', 'wb'))
# save the scaler
dump(sc, open('scaler.pkl', 'wb'))
app.py
If you want to use your model in a different file or deployment kindly refer to the below code.
from pickle import load
# load the model
model = load(open(‘model.pkl’, ‘rb’))
# load the scaler
scaler = load(open(‘scaler.pkl’, ‘rb’))
#predict user age 20 and salary 900000
user_age_salary=[[20,900000]]
scaled_result = scaler.transform(user_age_salary)
res=model.predict(scaled_result)
if res==1:
print(“He can buy the car”)
else:
print(“He can’t buy the car”)
Find the full source code here
Conclusion
Logistic Regression is the most efficient algorithm when the different outcomes or distinctions represented by the data are linearly separable. The main advantage of logistic regression is that it is much easier to set up and train than other machine learning algorithms. The most negative part of logistic regression is difficult to capture complex relationships. More powerful and complex algorithms such as Neural Networks can easily outperform this algorithm.
Check my other blogs…
Have doubts? Need help? Contact me!
LinkedIn: https://www.linkedin.com/in/dharmaraj-d-1b707898
Github: https://github.com/DharmarajPi