Logistic Regression

Jagajith
CodeX
Published in
6 min readSep 19, 2021

After learning the fundamentals of regression, it’s time to learn the fundamentals of classification. And what could be simpler than Logistic Regression!

It is recommended that you read this Linear Regression Tutorial before beginning this tutorial. This is a complete guide to Linear Regression, and Logistic Regression uses several ideas that are related to Linear Regression. With that said, let’s begin!

What is Logistic Regression?

It is a classification algorithm that is applied in situations when the output variable is categorical. The goal of Logistic Regression is to discover a relationship between features and the probability of a specific outcome.

For example,

Email: Spam / Not Spam?

Online Transaction: Fraudulent (Yes/No)

Here,

In this post, we will build a logistic regression model to predict whether a student gets admitted into a university.

Why Logistic Regression, not Linear?

In short Linear Regression, plots all the data onto a graph (of x and y), fits all the data to a best-fit line, and then makes predictions for inputs as the corresponding y. Logistic Regression on the other hand fits all the data to an S-curve and there are only two possible outputs (two classifications), which are represented as the top and bottom lines.

Linear and Logistic Regression

Sigmoid Function

The S curve does not refer to the shape of the letter S; instead, it stands for the sigmoid function. This is because the sigmoid function perfectly meets our purpose of categorising data into two groups. The sigmoid formula is as follows, where x is the number of inputs.

Sigmoid

In English, the sigmoid is just the calculation of probability based on the weighted sum of the input features. The formula for the weighted sum is as follows,

Before coding up our sigmoid function, let us initialize our dataset.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("ex2data1.txt", header=None)
data.head()
data.head()

Here ‘0’ represents students mark in 1ˢᵗ test, ‘1’ represents students mark in 2ⁿᵈ test and ‘2’ represents whether a student gets admitted(1) or not(0).

Let us visualize our data,

X = data.values[:, :-1]
y = data.values[:, -1]
pos, neg = (y==1).reshape(100, 1), (y==0).reshape(100, 1)plt.scatter(X[pos[:, 0], 0], X[pos[:, 0], 1], c='r', marker='+', label="Admitted")
plt.scatter(X[neg[:, 0], 0], X[neg[:, 0], 1], marker='o', label="Not Admitted", s=10)
plt.xlabel("Exam1 Score")
plt.ylabel("Exam2 Score")
plt.legend(loc=0)

In the case of Logistic regression the Hypothesis (h) is represented by the following equation:

Hypothesis

Pictorial representation of hypothesis:

Hypothesis
def sigmoid(z):
return 1 / (1 + np.exp(-z))
print(sigmoid(0))
print(sigmoid(10))
print(sigmoid(1))
Sigmoid Results

From the above figures we can tell that,

Logistic Regression Decision Boundary

Since our data set has two features: test1 and test2 the logistic regression hypothesis is the following:

The logistic regression classifier will predict “Admitted” if:

This is because the logistic regression “threshold” is set at g(z)=0.5, see the plot of the logistic regression function above for verification.

Cost Function

Notations

Let’s start by defining the logistic regression cost function for the two points of interest: y=1, and y=0, that is, when the hypothesis function predicts Admitted or Not Admitted,

Simplified cost function is as follows,

Then, we take a convex combination in y of these two terms to come up with the logistic regression cost function:

def Costfunction(X, y, theta):
m=len(y)

h_theta = sigmoid(X@theta)
y_pos = -y.T @ np.log(h_theta)
y_neg = (1-y).T @ np.log(1-h_theta)
error = y_pos - y_neg

cost = 1/m * sum(error)
grad = 1/m * (X.T@(h_theta - y))

return cost[0] , grad

Before using the data to compute cost, we should normalize our data(To know more about normalization Click here.)

def featureNormalization(X):
mu = np.mean(X, axis=0)
sigma = np.std(X, axis=0)
X_Norm = (X - mu)/sigma
return X_Norm, mu, sigma
m, n = X.shape
X, mu, sigma = featureNormalization(X)
X = np.column_stack((np.ones((m, 1)), X))
y = y.reshape(m, 1)
initial_theta = np.zeros((n+1, 1))
cost, grad= Costfunction(X, y, initial_theta)
print("Cost of initial theta is", cost)
print("Gradient at initial theta (zeros):", grad)

Gradient Descent

Logistic Regression’s gradient descent algorithm will look identical to Linear Regression’s gradient descent algorithm. For the case of gradient descent, the search direction is the negative partial derivative of the logistic regression cost function with respect to the parameter θ. In its most basic form, gradient descent will iterate along the negative gradient direction of θ (known as a minimizing sequence) until reaching convergence.

def gradientDescent(X, y, theta, alpha, n_iters):
m=len(y)
J_history =[]

for i in range(n_iters):
cost, grad = Costfunction(X, y, theta)
theta = theta - (alpha * grad)
J_history.append(cost)
return theta, J_history
theta, J_history = gradientDescent(X=X, y=y, theta=initial_theta, alpha=1, n_iters=400)

Plotting the Decision Boundary

Here, the decision boundary is as follows:

plt.scatter(X[pos[:,0],1],X[pos[:,0],2],c="r",marker="+",label="Admitted")
plt.scatter(X[neg[:,0],1],X[neg[:,0],2],c="b",marker="x",label="Not admitted")
x_value = np.array([np.min(X[:,1]),np.max(X[:,1])])
y_value = -(theta[0] +theta[1]*x_value)/theta[2]
plt.plot(x_value,y_value, "r")
plt.xlabel("Exam 1 score")
plt.ylabel("Exam 2 score")
plt.legend(loc=0)

Predictions

x_sample = np.array([45, 85])
x_sample = featureNormalization(x_sample)[0]
x_sample = np.append(np.ones(1), x_sample)
prob = sigmoid(x_sample.dot(theta))
print("For a student with scores 45 and 85, we predict an admission probability of ",prob[0])

From the above output, it shows that the student with marks 45 and 85 has 80% probability of getting admitted in the university.

def predict(X, theta):
p = sigmoid(X@theta) >= 0.37#select your own threshold
return p

Conclusion

Today, we saw the concepts behind hypothesis, cost function, and gradient descent of Logistic regression. Then it was created from scratch using python’s numpy, pandas and matplotlib. The dataset and final code is uploaded in github.

Check it out here Logistic Regression.

If you like this post, then check out my other posts in this series about

1. What is Machine Learning?

2. What are the Types of Machine Learning?

3. Uni-Variate Linear Regression

4. Multi-Variate Linear Regression

5. What are Neural Networks?

6. Digit Classifier using Neural Networks

7. Image Compressing with K-means Clustering

8. Dimensionality Reduction on Face using PCA

9. Detect Failing Servers on a Network using Anomaly Detection

Last Thing

If you enjoyed my article, a clap 👏 and a follow would be absolute badass and it is helpful for medium to promote this article so that others may read it. I am Jagajith and I will catch you in the next one.

--

--