Gradient Descent 2 + linear regression algorithm

6/20 Machine Learning via Stanford

Robin Lee
1 min readJun 20, 2014

Gradient descent helps find the local minima.

Algorithm — cost function is the 1/m*SSreg

When alpha too small, it’s very slow. When alpha too large, it will overshoot and could be further away from the local minima.

When we plug in what cost function is, we get the following function when j = 1. when j =0, omit x(i)j.

This explains why cost function has 2 in its denominator.

Have hypothesis plot placed next to the cost function plot — this will show how the linear regression line changes as we change the parameters to minimize cost function (Sum of Squares — reg)

This algorithm requires lots of iterations to solve optima.

Yet, what’s coming up is

  1. we can skip the iterations
  2. we can learn more features

Terms

batch gradient descent

normal equations method — can numerically solve cost function

--

--

Robin Lee

Data Analyst @NYTimes, Interested in stats and leadership