Gradient Descent Implementation Using Python

Heramb
4 min readFeb 18, 2020

To start with Machine Learning, one of the basic aspects of proceeding is to understand and implement the concept of Gradient Descent. It’s a base of the Machine Learning Algorithm.

Gradient Descent is a weight optimization machine learning technique through which the model will find the optimized solution for the data. Basically it is used to find the minimum value of a loss function by finding the value that results in the lowest output of the function.

This function signifies that how bad a model can perform simultaneously with actual occurrences.

Gradient Descent consists of

  1. A function/equation to minimize, H(theta)
  2. A cost function to obtain mean squared error J(theta)
  3. Gradients (partial derivative function) to obtain a new value of thetas for the next step until an optimal value is found.

Gradient Descent equation is given as

where theta 0 and theta 1 is the value which will be minimized by finding the lowest output of the function, and X represents all independent data.

Cost Function

where m is the number of observation

Gradients:

where alpha is the learning rate through which model will step towards

Hence, the new value of theta 0 and theta 1 would be (by using this formula),

Here, m is 1 as there is only one observation is present, i.e. X

Hence by using these formulas, one can easily implement the gradient descent in python.

Proceeding with an implementation,

# Importing numpy for mathematical calculations
import numpy as np
# Loading the data
X = [0.8, 1, 1.2, 1.4, 1.6, 1.8, 2, 2.2, 2.4, 2.6]
Y = [0.7, 0.65, 0.9, 0.95, 1.1, 1.15, 1.2, 1.4, 1.55, 1.5]
# Creating an array of ones for algebraic calculation
ones = [1] * len(X)
# Main Logic
# Here array of ones will be concatenated with X and transpose will be perform
# This signifies that matrix multiplication can help to obtain new value
# of theta 0 and theta 1
X = np.transpose(np.concatenate((np.array([ones]).reshape(-1, 1),np.array([X]).reshape(-1, 1)), axis=1))
print(X)
# [[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. ]
# [0.8 1. 1.2 1.4 1.6 1.8 2. 2.2 2.4 2.6]]
# Same declaring starting value of two thetas, theta 0 and theta 1,
# Setting value to 0, this means GD will start finding optimal value from 0
zeroes = [0] * X.shape[0]
theta = np.array([zeroes])
print(theta)
# [[0 0]]# Defining the learning rate
learning_rate = 0.01
htheta = np.dot(theta, X)
print(theta.shape, X.shape, htheta.shape)
# (1, 2) (2, 10) (1, 10)diff_theta = htheta - Y
partial_derivative_theta = np.dot(diff_theta, np.transpose(X)) / len(Y)
# Implementing the formula define above
theta = theta - learning_rate * partial_derivative_theta
print(theta)# New Value of theta 0 and theta 1
# [[0.0111 0.02055]]

Now to optimized this value, a for loop can be defined with which it will run until the previous theta and new calculated theta becomes zero

learning_rate = 0.01# Threshold for iterating the loop
max_iter = 4000
# Appending all new values of theta (4000 theta values)
new_theta = []
for i in range(max_iter):
htheta = np.dot(theta, X)
diff_theta = htheta - Y
partial_derivative_theta = np.dot(diff_theta, np.transpose(X)) / len(Y)
theta = theta - learning_rate * partial_derivative_theta
new_theta.append(theta)
print(new_theta[max_iter-1])# Therefore the optimal value would be
# [[0.2455339 0.50855582]]

Now these values will be imputed in the gradient descent equation to predict the Y and check the r-squared value for the same,

# Loading the same data again
X = [0.8, 1, 1.2, 1.4, 1.6, 1.8, 2, 2.2, 2.4, 2.6]
Y = [0.7, 0.65, 0.9, 0.95, 1.1, 1.15, 1.2, 1.4, 1.55, 1.5]
Y = np.array([Y])
X = np.array([X])
# Feteching the optimized theta 0 and theta 1 value
theta0 = new_theta[max_iter-1][0][0]
theta1 = new_theta[max_iter-1][0][1]
# Imputing X into an equation to predict Y
pred_y = theta0 + (theta1 * X.reshape(-1,1)).sum(axis=1)
print(pred_y)
# [0.652379 0.75409007 0.85580115 0.95751223 1.05922331 1.16093438
# 1.26264546 1.36435654 1.46606762 1.5677787 ]
# Computing the mean for calculating r-squared
mean_y = Y.mean()
# Computing Explained Sum of Squares
ess = ((Y - mean_y) ** 2).sum(axis=1)
# Computing Residual Sum of Squares
rss = ((Y - pred_y) ** 2).sum(axis=1)
# Computing R-Squared
rsquared = 1 - (rss/ess)
print("R-Squared of the Model is: ", rsquared)# R-Squared of the Model is: [0.96206043]

In this way, a simple gradient descent machine learning algorithm can be implemented.

--

--