Selecting Optimal LSTM Batch Size

Caner
2 min readMar 25, 2020

--

The batches are used to train LSTMs, and selecting the batch-size is a vital decision since it has a strong impact on the performance e.g. the prediction accuracy.

Why do we need batches?

For models like LSTM and CNN, the batch size is critical to learn the common patterns as important features. For a model to be able to figure out what are the common patterns and features across the input train samples we need to provide the samples as batches.

By providing, a number of samples, a batch at a time, we can introduce a set of samples to the model so the model can distinguish the common features by looking at all the introduced samples of the batch.

A nice solidified example can be described as follows: assume that we want to train a model that can classify as 1 when a human is an image and classify as zero when a human is not in the image. We can train the model by providing one image at a time or we can use batches, where we can provide at a step 10 input images into the model, so the model by looking at these 10 images can distinguish the common patterns like head, arms, legs and so on. And as by our experience, is this similar to machines especially for such networks as LSTMs, also for CNN’s for convolutional neural networks. It is important to provide a batch-size which is for the machine optimal to learn the underlying pattern.

Optimal Batch Size?

By experience, in most cases, an optimal batch-size is 64. Nevertheless, there might be some cases where you select the batch size as 32, 64, 128 which must be dividable by 8. Note that this batch size fine-tuning must be done based on the performance observation.

--

--