Understanding Gradient Descent in Field Of Deep Learning

Erhan Arslan
4 min readNov 25, 2023

--

Firstly, Let us try to explain it with non-scientific way.

Gradient descent is like trying to find the lowest point in a valley while blindfolded. Imagine you’re standing on a slope in the dark and your goal is to get to the lowest point by taking small steps downhill.

In math terms, imagine you have a function that represents this slope.
Let’s call it f(x), and x is the point where you are standing. The goal of gradient descent is to find the value of x that minimizes f(x), i.e., finds the lowest point on the slope.

The “gradient” is like the direction you should take to go downhill. It tells you the slope’s steepness at your current position. If the slope is steep, you take larger steps. If it’s shallow, you take smaller steps. This process of taking steps in the direction that decreases the function is what gradient descent does.

In the context of Neural networks, gradient Descent is used to minimize a cost function that measures how “WRONG” the model predictions which are compared to the actual values. This cost function is typically defined based on the model’s output and true values.

I tried to make it more understandable with 3D graph.

In this figure, we have x,y,z values . And out cost function maybe 3D sometimes. Our goal is to find lowest cost function values with given x,y,z.

to make it more understandable,
Let us think our cost function is f(x) → f(x)=x^2+5x+6
Now, we have to take derivative of our function f ′(x)=2x+5

def function(x):
return x**2 + 5*x + 6

This defines the function f(x)=x2+5x+6. This is the function whose minimum we’re trying to find using gradient descent.

def derivative(x):
return 2*x + 5

This defines the derivative of the function f ′(x)=2x+5. The derivative represents the slope of the function at a given point x.

def gradient_descent(initial_x, learning_rate, epochs):
x = initial_x

for epoch in range(epochs):
gradient = derivative(x)
x = x - learning_rate * gradient
# Print current step and x value
print(f"Epoch {epoch + 1}: x = {x}")

return x

This function implements the gradient descent algorithm. It starts at a given initial value of x, calculates the derivative of the function at that point, and updates x by subtracting the gradient multiplied by the learning rate.

gradient_descent(5, 0.1, 12)

This initiates the gradient descent process starting at x=5, using a learning rate of 0.1 and running it for 12 epochs (iterations).

During each epoch, the code computes the gradient (slope) of the function at the current x value, updates x in the direction that minimizes the function and prints the current step along with the updated x value.

This process repeats for the specified number of epochs (10 in this case), gradually moving towards the minimum of the function f(x)=x2+5x+6.

Here is the full python code. I tried to visualize gradient descent points and steps with graphs under the code.

import matplotlib.pyplot as plt
import numpy as np

# our cost function. you can try it with various samples.maybe you can take 3rd degree equalities.
def function(x):
return x**2 + 5*x + 6

# derivative of our functions is 2*x + 5. if you want, you can use numpy functions to take derivatives.
def derivative(x):
return 2*x + 5

# Gradient Descent function. In here, in every step, I tried to visualize the gradient point.
#it takes 3 parameter. where to start, my step size and epoch(iteration)
def gradient_descent(initial_x, learning_rate, epochs):
x = initial_x
x_vals = []
f_vals = []

fig, axs = plt.subplots(4, 3, figsize=(12, 12))

fig.suptitle('Gradient Descent Visualization')

for epoch in range(epochs):

#this two lines is the gradient descent algorithm.
# the other lines of function is to draw useful figures for blog :)
gradient = derivative(x)
x = x - learning_rate * gradient

# my values for plotting
x_vals.append(x)
f_vals.append(function(x))

# Calculate subplot index
row = epoch // 3
col = epoch % 3

# Plotting the function and points for the current epoch
axs[row, col].set_title(f"Epoch {epoch + 1}")
axs[row, col].set_xlabel('x')
axs[row, col].set_ylabel('f(x)')

# Plot the function
x_range = np.linspace(-10, 5, 100) # Adjust the range for the function plot
axs[row, col].plot(x_range, function(x_range), label='f(x) = x^2 + 5x + 6')

# Plot the points on the function
axs[row, col].scatter(x_vals, f_vals, color='red', label='Gradient Descent Points')
axs[row, col].legend()
axs[row, col].grid(True)


plt.tight_layout()
plt.show()

return x


gradient_descent(5, 0.1, 12)

After every epoch, small changes of x going to be closer our lowest value. I mean lowest value of our valley.

I hope you like it. thanks.

--

--

Erhan Arslan

AI Expert | 15+ Years in Backend & Enterprise Solutions | Pioneering Machine Learning & NLP Solutions & Deep Learning | Researcher & Author on AI