Epoch Vs Batch Size Vs Iterations Explained in Fewer than 140 Characters
Before moving to know depth about the epoch, batch, and iteration, first, let me clear your novice doubt i.e. we use batch, epoch but where it exactly it stands and how can we initially define it.
Samples
A single row of data is called a sample, It contains inputs that are used into an algorithm and an output i.e. used to compare the prediction and calculate an error.
A training dataset contains a number of samples e.g. many more rows. A sample may also be defined as observation and input vector of the dataset.
So using the many samples we can form a dataset.
Okay! So you get the idea about the sample, but to find out the differences between epoch, batch, and iterations you need to know terms like Gradient Descent to help you better understand.
In a simple word, if I have to define gradient descent -
Gradient means the rate of inclination or declination of a slope.
Descent means the instance of descending.
Why these terms not only samples?
We need these terms only when the data is too big which happens every time in machine learning and also we can’t pass all the data to the computer at once because your machine has limited computational and I/O value.
So, to overcome this problem we need to divide the data into smaller/chunk sizes and give it our machine one by and update the weights of the Neural Networks at the end of every stop to fit it to data given.
Epoch
In simple words, One Epoch is completed when a complete dataset is cycled forward and backward through the neural network or you can say your neural network has watched the entire dataset for once.
If you think technically, It is a for-loop iterating over the number of epochs where each loop proceeds over the training dataset. Within this for-loop is another nested for-loop that iterates over the samples.
So if you see still epoch is too large to handle to the computer, so we need to divide it into smaller batches.
A right number of epochs?
Everyone once has doubts about the right number of epochs in machine learning, but unfortunately, there is no right answer to this question. The batch can decide the underfitting to optimal to the overfitting curve.
Batch Size
Batch size is a parameter that defines the number of samples to work through before updating the neural network weights. If we define batch in simple words, A batch is a total number of training examples present in a single batch.
Think batch as a for-loop like iterating over one or more samples and updating the neural network weights. In the end, the predictions are compared to the expected predictions and an error value is calculated.
A training dataset can be divided into a number of batches.
According to machine learning mastery
When all training samples are used to create one batch, the learning algorithm is called batch gradient descent. When the batch is the size of one sample, the learning algorithm is called stochastic gradient descent. When the batch size is more than one sample and less than the size of the training dataset, the learning algorithm is called mini-batch gradient descent.
- Batch Gradient Descent. Batch Size = Size of Training Set
- Stochastic Gradient Descent. Batch Size = 1
- Mini-Batch Gradient Descent. 1 < Batch Size < Size of Training Set
Like we divide the article into batches to write and easy to understand, machine learning does the same.
Iterations
Iterations are basically the number of batches needed to complete one epoch.
If you think it as for-loop like iterating over the number of batches and updating the weights of neural network.
Difference
- Batch size if the number of samples processed before the model weights updated.
- A number of epochs are the number of complete passes through the training dataset.
- Iterations are the number of epochs passed through the dataset.
- You need to specify iterations and batch size in the algorithm to proceed
Example
Assume you have dataset with 4,000 samples (rows) and you choose a batch size of 5 and 800 epochs.
This means that the dataset will be divided into 200 batches, each with five samples,
This means that one iteration will have 200 batches and 200 updates to the model.
With 2,000 epochs, the model will be exposed to or pass through the whole dataset 2,000 times. That is a total of 80,000 batches during the entire training process.
References
Read More Story at StackLegacy Pvt Ltd.