Gradient Decent Intro
Gradient decent is an algorithm that minimizes functions. Given a function with a set of parameters, gradient decent iteratively moves towards a set of parameters values that minimize the function. Typically you would want to minimize a cost function, or the error of the predicted value to actual value. This is a good primer topic on how back propagation models use gradient decent to minimize the cost function of the neural networks. That topic will be discuss in future blogs, but today we will look at gradient decent, how it works and how to use it to solve problems.
The objective of linear regression is to fit a line to a set of points. If we want to model a set of points to a line we would use the slope intercept equation y = mx + b, where m is the slope and b is the y-intercept. To find the best line for a data set like the graph below, we would need to find the best parameters for the y intercept b and slope m.
A way we can solve this is to define an error function or a cost function that measures how good the fit of the line is relative to our data; in this case GPA scores. The cost function will take in b and m as parameters and return an error base on how well fitted the line is. We compute the error by summing the distances between each point’s y value and the line’s y value. We then square this distance to make sure that it is positive and to make our function differentiable. Our cost function will look like this:
We first need to compute this function’s gradient to run gradient decent on it. We need to differentiate our error function. Since its defined by two parameters, we need partial derivative of each.
Now that we have both partial derivatives of our cost function, we can run a gradient decent to find new m and b values that minimize the error. With each iteration our gradient times the learning rate should get smaller and smaller as it approaches a minimum. If the learning rate is to big we may step over the minimum and if its to small we may have many iterations before it reaches the minimum.
In machine learning, specifically neural networks, gradient decent can be used to minimize the cost function of the weights of each axon in the neural network. This is a powerful tool that will be discuss in a separate blog post. I hope you liked the introduction to gradient decent.