Gradient Descent Algorithm-Explained

PremalMatalia
2 min readFeb 29, 2020

--

Gradient descent is an iterative optimization algorithm. In linear regression, it is used to optimize the cost function and find the values of the θ1 (estimators) corresponding to the optimized value of the cost function.

Consider we are walking down the graph below and we are at green point and our target is to reach to minima red point. We don’t have any visibility so we can either go up or down.

Gradient descent helps here to decide how to reach to minima or have minimum cost function.

Mathematically, the aim of gradient descent for linear regression is to find the solution of ArgMin J(θ0 , θ1), where J(θ0 , θ1), is the cost function of the linear regression. It is given by —

Here,

  • h is the linear hypothesis model, h= θ0 + θ1x
  • y is the true output
  • m is the number of data points in the training set

We want to identify value of co-efficient θ1 of estimator, the equation will look like this:

θ­1=θ0−η(∂/∂θ) J(θ)

Where η= the learning rate, which defines the speed at which we want to move towards negative of the gradient.

  • It should be very small to reach solution at slow speed.
  • A large value of learning rate may oscillate your solution, and you may skip the optimal solution (global minima)

For example:

J(θ) = θ^2

(∂/∂θ) J(θ) = 2θ

η = 0.1 (learning rate)

θ0 = 2 (starting point on the curve)

As per below table, you can verify that it took around ~28 iterations to reach to minima = 0 with learning rate of 0.1.

--

--