Gradient Descent

3 min readApr 29, 2024

As data scientists, our goal is often to find the best model that fits a particular dataset. Typically, “best” refers to minimizing errors or maximizing the likelihood of accurate predictions. This involves solving optimization problems, and a popular technique for this is Gradient Descent.

Here is a rewritten version with improved clarity and organization:

The Concept of Gradient Descent

Imagine you have a function, let’s call it f, that takes a vector of real numbers and spits out a single real number.

def sum_of_squares(v):
   return sum(v_i ** 2 for v_i in v)

Now, let’s say we want to maximize or minimize this function. In other words, we need to find the input vector, v, that results in the largest or smallest possible output value.

This is where the gradient comes into play. The gradient, for those who remember their calculus, is a vector of partial derivatives. It points in the direction where the function increases at the fastest rate. So, if we want to maximize our function, we can start by picking a random point and computing the gradient. Then, we take a small step in the direction indicated by the gradient, as that’s where the function is increasing the most.

Similarly, if we want to minimize the function, we take small steps in the opposite direction of the gradient. This is the basic idea behind Gradient Descent — an iterative algorithm that helps us find the optimal input for a given function.

Finding a minimum using gradient descent

How Gradient Descent Works

Gradient Descent is an iterative algorithm that gradually adjusts the input values to minimize a given function. Here’s a step-by-step breakdown of the process:

Initialize: Start by choosing an initial point, often randomly, in the input space.
Compute Gradient: Calculate the gradient of the function at the current point. The gradient indicates the direction of steepest ascent.
Update Input: Move a small step in the opposite direction of the gradient. This step size is controlled by a parameter called the “learning rate.”
Repeat: Continue repeating steps 2 and 3 until you converge to a minimum or reach a specified number of iterations.

Example: Linear Regression with Gradient Descent

Let’s illustrate the power of Gradient Descent with a practical example of linear regression. Suppose we have a dataset of house prices and their corresponding areas, and we want to find the best-fitting line that represents the relationship between area and price.

In this case, our function to minimize is the sum of squared differences between the predicted prices and the actual prices. By applying Gradient Descent, we can iteratively update the coefficients of the line to minimize this error.

Here’s a simplified version of the code:

def linear_regression(areas, prices, learning_rate, num_iterations):
    # Initialize coefficients randomly
    m = np.random.rand()
    b = np.random.rand()

    for _ in range(num_iterations):
        # Compute predictions
        predicted_prices = m * areas + b

        # Compute gradient
        gradient_m = -2 * np.sum(areas * (predicted_prices - prices))
        gradient_b = -2 * np.sum(predicted_prices - prices)

        # Update coefficients
        m += learning_rate * gradient_m
        b += learning_rate * gradient_b

    return m, b

By running this code with appropriate values for areas, prices, learning_rate, and num_iterations, we can obtain the optimal coefficients m and b that define the best-fitting line for our data.

For example running the above code on the Housing Price dataset with a learning_rate= 1e-4 and num_iterations= 1000. We get the coefficients: Slope (m): -3.192176687646158e+44
Intercept (b): 1.9902595362891416e+44

I hope this article helped you understand Gradient Descent a little. There is more to be said about gradient descent, they will be covered in latter posts.

Gradient Descent

The Concept of Gradient Descent

How Gradient Descent Works

Example: Linear Regression with Gradient Descent

Written by Minase Fikadu