All You Need to Know Of Optimization Algorithm in Deep Learning

T Z J Y
5 min readOct 13, 2021

It is known that deep learning algorithms involve optimization in many many contexts. In practice, we often use analytical optimization to design algorithms. However, given the complexity of deep learning, it is quite common to invest days or even months of time across hundreds of machines to solve just a few instances of neural network training. Since the problem is so important, researchers and data scientists have spent a lot time developing optimization techniques to solve it, which is what I’d like to cover in this post.

Gradient Descent: Batch vs. Mini-batch

Most researchers would tend to use Gradient Descent to update the parameters and minimize the cost. Gradient Descent, also known as Batch Gradient Descent, is a simple optimization method, which users would run the gradient descent on the whole dataset. When we train neural network with a large data set, the process becomes very slow. Thus it is a good idea to find an optimization algorithm that runs fast. I will use the following example to tell the difference:

Suppose we have m = 50 million. To train this data it will take a huge processing time for one step because 50 million won’t fit in the memory at once we need other processing to make such a thing.

--

--