Understand Gradient Descent — A Simple Explanation.

Ravindu Pabasara
5 min readJun 5, 2024

--

When we talk about machine learning and optimisation tasks, we can’ t forget gradient descent. Gradient descent is a very important technique used to minimise various functions. This concept can be intimidating at a glance. But its not that hard at all to understand.

To understand the gradient descent, we can use a cool analogy: finding the lowest point in a mountain range! Let's explore this mountains and valleys of gradient descent! Don’t be afraid of the plots. They are easy to understand!

The Mountain and the Landscape

Imagine you’re standing on a big mountain. Around you lies a massive and vibrant landscape with different geographical feaures.

This terrain has peaks, slopes, and valleys. Think of those landscape as the cost function of the problem.

Your goal is to find the lowest point in the landscape. It represents the optimal solution where the loss function (or cost function) is at its minimum.

Position and Height

Your current position on this beautiful terrain corresponds to the current set of parameters in the model you use.

The height represents the value of the loss function for the current parameters.

Think that you want to reach the lowest possible height (lowest point) in this terrain. Similarly in gradient descent our aim is to reduce the loss function value. We do it by adjusting the parameters.

Slope and Gradient

Imagine you are standing on the mountain. You will note the slope of the terrain under your feet. Now, this slope is what indicate the steepness and direction of the descent. If we use frightening mathematical terms, this slope is similar to the gradient of the loss function with respect to the parameters.

The slope is steep — the gradient is large.
The slope is gentle — the gradient is small.

Look at the plot. Those contour lines shows the topography of a mountain, and their spacing shows the steepness of the terrain. The red dot is your current position and the red arrow shows the direction of the gradient at that point.

Going down the Mountain

If you want to reach the lowest point, you need to go downhill. So you have to follow the direction where the terrain slopes downward the most. In gradient descent, this means updating the parameters in the opposite direction of the gradient. By doing that, you iteratively adjust your parameters to reduce the value of the loss function.

Step Size: The Learning Rate

As you move down the mountain, the size of each step you take is very important. This step size is similar to the learning rate in gradient descent.

If your steps are too large — you can step over the lowest point and miss it. You may even climb up another slope without knowing.

If your steps are too small — your progress will be really slow and you might get stuck in rough terrain.

Balancing the step size is crucial to find the best path down the mountain efficiently. Look at the plot. I have used a simpler cost function to make it easier to understand. The blue curve is the cost function (a quadratic one). The red dot marks the global minimum (lowest point). We then perform gradient descent with three different learning rates (0.1, 0.5, and 0.9) and show the steps taken for each learning rate. You can get an idea by examining the plot. Look how the orange one steps down the mountain and red one steps down.

Reaching the Valley: Finding the Global Minimum

So, after taking many steps downhill, you will arrive at the lowest point in the terrain. This point is the global minimum of the loss function. This is where the parameters are optimised. Here the loss is minimum.

But, the terrain might have several valleys, or local minima. Those are lower points, but are not the absolute lowest. It’s possible to get trapped in one of these local minima. You think you’ve found the optimal solution. But in fact, an even lower valley is there!

Look at this plot. In here, there is a one local minima at the top left corner. And there is the global minima at the lower right corner.

Going downhill does not mean there are no challenges on the Way!

The path down the mountain is not free from challenges. The terrain can be rugged and uneven. Just like a loss function with many local minima and maxima.

Going through this complex landscape needs care. Consider your steps and adjustments carefully.

Sometimes you may have to use techniques like momentum, adaptive learning rates, or stochastic approaches to overcome these challenges. Well they are for another article!

So, Hope you got that!

This analogy gave us a way to understand the process of gradient descent. This is the key, If you needed a one.

  • Current position — The current parameters.
  • Height — The value of the loss function.
  • Slope — The gradient of the loss function.
  • Step size — The learning rate.
  • Descending — Updating parameters in the direction of the negative gradient.
  • Lowest point — The global minimum of the loss function.

So I hope you understood the gradient descent better. The concepts behind these AI stuff is not that hard. You just have to look at those in a cooler way.

Thank you for reading. Leave your ideas behind!

--

--

Ravindu Pabasara

AI Undergraduate | Sinhala and English Writer | A literature enthusiast | Tech lover