Gradient Descent in Machine Learning

Seth Larweh Kodjiku
unpack
Published in
3 min readNov 9, 2020

The objective of Gradient Descent

Gradient descent is an exceptionally well known and common algorithm utilized in different Machine Learning algorithms, above all forms the premise of Neural Networks. In this article, try to clarify it in detail, yet in straightforward terms.
Gradient, in plain terms, implies slant or inclination of a surface. So gradient descent in a real sense implies descending a slope with the aim of reaching the absolute bottom on that surface. Let us envisioned a 2D graph, such as a parabola in the figure below.

From the above graph, The goal of gradient descent algorithm is to discover “x” with the end goal that “y” is minimum. “y” here is known as the objective function that the gradient descent algorithm works on, to move to the absolute bottom point.

The Algorithm-Gradient Descent

I utilize a linear regression problem to clarify the gradient descent algorithm. The aim of regression is to limit the sum of squared residuals. We realize that a function arrives at its base value when the slant is equivalent to 0. By utilizing this procedure, we tackled the linear regression problem and learned the weight vector. A similar issue can be tackled by gradient descent procedure.
Gradient descent is an iterative algorithm, that begins from an irregular point on a function and goes down its slant in steps until it arrives at the lowest bottom of that function.”
This algorithm is useful in situations where the optimal points can’t be found by likening the slope of the function to 0. On account of linear regression, you can intellectually plan the sum of squared residuals as the function “y” and the weight vector as “x”.

Moving down the steps

This is the core of the algorithm. The overall thought is, to begin with an irregular point and figure out how to update this point with each iteration with the end goal that we descend the slope.

The means of the calculation are

  1. Find the slope of the objective function concerning each feature, that is, calculate the gradient of the function.
  2. Pick an arbitrary initial value for the parameters.
  3. Update the slope or gradient function by connecting the parameter values.
  4. calculate the progression sizes for each component as step size = gradient * learning rate.
  5. Ascertain the new boundaries as new params = old params — step size
  6. Repeat steps 3 to 5 until the gradient is almost 0.

The “learning rate” referenced above is an adaptable parameter which intensely impacts the convergence of the algorithm. Bigger learning rates make the algorithm make tremendous strides down the slope and it may hop over the base point in this way missing it. Thus, it is in every case great to adhere to a low learning rate, for example, 0.01. It can likewise be numerically indicated that gradient descent algorithm makes bigger strides down the slant if the beginning stage is high above and makes baby strides as it arrives at nearer to the objective to be mindful so as not to miss it and furthermore be quick enough.

--

--