Understanding Gradient Descent Algorithm in Deep learning.

Divyanshu Singh
2 min readFeb 28, 2024

--

Let’s start with a simple example of a person trying to find the lowest point in a hilly terrain using the gradient descent algorithm.

Imagine you are standing on a hill, and your goal is to reach the lowest point at the bottom. You are blindfolded and can only perceive the steepness of the slope at your current position. You decide to take small steps downhill in the direction of the steepest slope to descend quickly.

In this analogy, the hill represents the mathematical function that we want to optimize, and the height of the terrain corresponds to the value of the function at a particular point. The goal is to find the lowest point, which corresponds to the minimum value of the function.

Here’s how the gradient descent algorithm works in this scenario:

Initialization: You start at a random position on the hill.

Calculation of slope: You estimate the slope (or steepness) of the terrain at your current position. This slope is equivalent to the derivative of the function at that point. The derivative indicates the direction of the steepest descent.

Step size determination: You choose a small step size (learning rate) that determines how far you will move in the direction of the steepest slope. A larger step size may lead to overshooting the minimum, while a smaller step size may result in slower convergence.

Update position: You take a step downhill in the direction of the steepest slope, adjusting your position accordingly.

Repeat steps 2–4: You continue to iterate this process, recalculating the slope, adjusting your position, and gradually moving closer to the bottom of the hill.

Termination: You stop the iterations when either you reach a predefined number of iterations or when the change in position becomes very small (indicating that you are close to the minimum).

The intuition behind this algorithm is that by iteratively moving downhill in the direction of the steepest slope, you are effectively minimizing the function. As you get closer to the minimum, the steps become smaller, and the algorithm converges to the optimal solution.

In real-world applications, gradient descent is commonly used to optimize various machine learning models, such as linear regression or neural networks. The algorithm iteratively adjusts the model’s parameters based on the gradients of the loss function, aiming to find the parameter values that minimize the prediction error.

Overall, gradient descent provides a powerful and intuitive approach to find the optimal solution in various optimization problems, leveraging the knowledge of the local slope to guide the search.

--

--

Divyanshu Singh

Exploring ML, DL, NLP, Speech Tech & data-driven research as a BS Data Science student at IIT Madras. Sharing insights on Medium. Join the journey.