SGD(Stochastic gradient descent) in one minutes
First
Differentiation is a useful tool for many usage. Here we see a simple usage to get global/local minimum using differentiation.
Numerical differentiation
Basic idea of differentiation is to get the small change on function as an amout.

We use can estimate a differentiation at x value and h value as small change.
As we use this differentiation on our computer we use “numerical differentiation” like,
#h as small change
h = 0.01#define a differentiation
def dd1(f, x):
return (f(x+h)-f(x))/h#define a function to get the differentiation
def f1(x):
return x*x#run
dd1(f1, 2)#the result is
#4.009999999999891
With a small number “h”, we could do the step.
Partial differentiation
Next we see at partial differentiation. If we have multiple number of variable, we want to apply differentiation on each variable not affect to the other one. We have,

We have f(x, y) differentiated by x or y.
Gradient
Gradient is write this partial differentiation on vector.

This is pointing to the lower value of function as a vector. If you are at location far from the minimum point, this vector will be becoming bigger.
So if you want to get closer to the minimum point you can update the location where you are with the equation like below.

SGD
Stochastic gradient descent, SGD is almost like that but using the data of the point as a random number. Usually on the machine learning the labeled data will be selected as random, not to fall into a local minimum which is not the best answer for the learning model.

