Gradient Descent

A must-know optimization method

Ray Hsu
Geek Culture

--

Gradient descent is a common optimization method in machine learning. However, same as many machine learning algorithms, we normally know how to use it but do not understand the mathematical background. This article will thoroughly explain the math behind gradient descent.

Cost Function and Optimization of Weight

When measuring the performance of algorithms, we use different cost functions for corresponding algorithms. For example, we usually use MSE to measure the performance of linear regression.

or you can write MSE as following.

To make a machine learning algorithm, we need to design an algorithm that will improve the weights in a way that reduces MSE when the algorithm is allowed to gain experience by observing a training set. — Deep learning by Lan Goodfellow

By seeing this formula, we know that the cost function is a convex problem.

As you can see, the lowest point of MSE happens where the gradient is 0.

In conclusion, optimization is the process of finding a better weight to reach the

--

--