WHEN and WHY are batches used in machine learning ?

On to the nitty-gritty details of batches in machine learning data set

Bipin Krishnan P

Published in

Analytics Vidhya

4 min readNov 19, 2019

Introduction

Hey there, have you ever come across the term “batch” while loading data sets ?

Whether the answer is a Yes or No, today you will learn about batches and why you should even consider using it in your machine learning pipeline.

No more delays, let’s jump into it right away.

Hmm, this must be definitely explained through an example.

Hold on tight, here comes your story —

You have got a brilliant idea to build a deep learning model to detect brain tumor and other abnormalities of brain from MRI scans.

Seems like a great idea to build a startup, right ?

First thing is to collect the required data, for now assume that you have already done that and now you are ready with your data.

Now is the right time to understand what is “batch”.

WHEN and WHY to use batch ?

Before loading the data set to the memory we have two options -

1. You can either load the whole data set to the memory at once or

2. You can load a sample set of data into the memory

Didn’t understand a thing, right? Let me break it down for you.

You are just starting to build your dream startup as said earlier, so you might not be having a high end GPU or CPU.

You may be having a data set of huge size, say, a million brain scan images. So, if you load the whole data set into the memory, the training speed of the model will be very slow because you are using a lot of memory in your CPU which is very inefficient.

Guys, don’t get disheartened, there is a better method for you.

You can break your data set into batches, that is, if you have a data set containing ten brain scan images, you can split your data set into two batches where each batch has five images,

Total images = 10

1 batch = 5 images

So, a total of two batches.

(Number of batches * Number of images in a single batch = Total number of data set) => (2 * 5 = 10).

Enough of this child’s play, let’s get bigger, if you have a brain scan image data set containing 100000 images, we can convert it into 3125 batches where each batch has 32 images in it.

Total images = 100000

1 batch = 32 images

So, a total of 3125 batches, (3125 * 32 = 100000).

So, instead of loading the whole 100000 images into memory which is way too expensive for the computer, you can load 32 images(1 batch) for 3125 times which requires way less memory as compared to loading the complete data set.

Another reason for why you should consider using batch is that when you train your deep learning model without splitting to batches, then your deep learning algorithm(may be a neural network) has to store errors values for all those 100000 images in the memory and this will cause a great decrease in speed of training.

The model updates the parameters(weights and bias) only after passing through the whole data set.

But, if you split your 100000 image data set into batches containing 32 images, the model has to only store the error values of those 32 images.

Here, the model updates the parameters after completing each batch.

After each 32 image(1 batch), the parameters are updated.

Different gradient descent algorithms based on batch size

There are 3 types of gradient descent algorithm based on the batch size:

Stochastic gradient descent

Here, each data set row is considered as a batch, that is, if you have a data set containing 1000 images, then each image is a batch(total 1000 batches), so the parameters like weights and bias are updated after each row of the data set.

2. Batch gradient descent algorithm

In this algorithm, the whole data set is considered as a batch, for a 1000 image data set, there is only one batch, with 1000 data(that is, the total rows in the data set).

3. Mini batch gradient descent

In this algorithm, the size of batch is greater than one and less than the total size of the data set, commonly used size of batch is 32(32 data points in a single batch).

If you wrap all your data in a single batch, it is called batch gradient descent and if the number of batches is equal to the number of data points in your data set, then it is called stochastic gradient descent. Finally, if the number of batches is between 1 and the total number of data points in the data set, it is called min-batch gradient descent.

Conclusion

You must have got the complete idea of batches and you must be able to answer the when and why of batches. So, the next time when you load your data set, think twice before training your model :)