Published in

Batch, Iteration, Epoch Concepts in Deep Learning Training

This post unravels these three words that may make life a little hard for newcomers to the deep learning world.

Gradient Descent

The story begins with the gradient descent optimization algorithm that trains neural networks. This algorithm changes the parameters based on the training set. It has three variants. The first one is stochastic gradient descent (SGD), which updates parameters one at a time for each training sample. An epoch means seeing all samples of a dataset and updating parameters based on all of them. SGD does updates to the number of data samples. Batch gradient descent (BGD) sums the error for each point in a training set, updating the model only after all training examples have been evaluated. The third one, mini-batch gradient descent (MBGD), combines the earlier two variants. It splits the training dataset into smaller batch sizes and performs updates on each of these batches. Batch size demonstrates the number of data samples in each batch. These updatings are called iterations; all iterations cover the entire dataset. Covering the whole dataset or all batches is called an epoch in MBGD.


Dataset: MNIST with 60_000 images

Number of data samples = 60_000


For going over the whole dataset, it does 60_000 updates to the parameters for each epoch.


BGD sums the error for all samples in the dataset and then updates the model. It does one update for each epoch.


Batch Size = 128

Number of images (data samples) in each batch = 128

Number of iterations = 60_000 / 128 = 469

Each epoch does 469 updates to parameters, as there are 469 iterations.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store