Batch, Iteration, Epoch Concepts in Deep Learning Training
This post unravels these three words that may make life a little hard for newcomers to the deep learning world.
The story begins with the gradient descent optimization algorithm that trains neural networks. This algorithm changes the parameters based on the training set. It has three variants. The first one is stochastic gradient descent (SGD), which updates parameters one at a time for each training sample. An epoch means seeing all samples of a dataset and updating parameters based on all of them. SGD does updates to the number of data samples. Batch gradient descent (BGD) sums the error for each point in a training set, updating the model only after all training examples have been evaluated. The third one, mini-batch gradient descent (MBGD), combines the earlier two variants. It splits the training dataset into smaller batch sizes and performs updates on each of these batches. Batch size demonstrates the number of data samples in each batch. These updatings are called iterations; all iterations cover the entire dataset. Covering the whole dataset or all batches is called an epoch in MBGD.
Dataset: MNIST with 60_000 images
Number of data samples = 60_000
For going over the whole dataset, it does 60_000 updates to the parameters for each epoch.
BGD sums the error for all samples in the dataset and then updates the model. It does one update for each epoch.
Batch Size = 128
Number of images (data samples) in each batch = 128
Number of iterations = 60_000 / 128 = 469
Each epoch does 469 updates to parameters, as there are 469 iterations.
Gradient descent - Wikipedia
In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm…