#1 ML … Gradient Descent (start)
Hi, today I am starting with intuition, code and everything on GD. I have had the the pseudo-code and intuition in my mind for about 10 months now.
But when I tried to think over it … a lot of questions popped in my head. The seemingly harmless code, kept me hanging for a long time. A lot of basic questions to speak the truth.
Like: 1)Q- How do we find the first m and c for h=mx+c ?
A- We assume a random initial values.
2)Q- How do we take first gradient? Intuitively, how do we know which way the gradient is for any step ?
A- [by taking the slope, ofcourse, but …. ]
3)Q- If fm is (let)positive, how do we know which direction to rotate the line?
A- This one is the most perturbing … though, taking example cases and seeing the process helps.
4)Q- Why does subtracting the gradient term helps (intuitively, in the last step of code below, … intuitively m and r.fm don’t seem to be of same dimension or being)?
A-
5) … and so on ….
Pseudo code :
1. Cost function : f = 0.5( y -h(x) )² , where h(x) = mx + c; m= slope, c=intercept
2. fm = -(y-h).x
fc = -(y-h).1
fm and fc are the derivative of f wrt m and c respectively.
3. m := m -r.fm
c := c -r.fc , where r is learning-rate and we are updating the parameters
My focus is on comparing the various GD methods available and test them on a simple data-set to compare.
I am starting with basic one : Vanilla GD. [the code above]
here, Vanilla stands for unadulterated or untouched or basic :) .
First things first , the above doubts shall be cleared.

