Batch Normalization Explained in Plain English

Theory and implementation in Keras

Rukshan Pramoditha

Published in

Data Science 365

7 min readAug 30, 2022

Photo by Praveen kumar Mathivanan on Unsplash

In neural networks, getting input feature values into the same range will result in the following benefits.

Faster convergence and speed up training

After normalizing, the values lie in a very small range. So, the values are now small and homogeneous (have the same range). This makes learning (converging) faster for neural networks!

Before normalizing data, the cost (loss) function has a non-symmetric shape. But after normalizing, the cost function becomes more symmetric, which is much easier to converge because the gradient descent algorithm goes straight towards the minimum error faster rather than oscillating between the minimum and maximum error, which needs more time to converge.

Reduce the vanishing gradient problem

This problem may happen when there are many layers in the neural network. During training, layer weights are often multiplied by very small numbers less than 1. When this happens in many layers, the resulting outputs become even smaller so that, at some point, computer hardware cannot represent these small values anymore. So, the vanishing gradient problem arises.

Batch Normalization Explained in Plain English

Theory and implementation in Keras

Faster convergence and speed up training

Reduce the vanishing gradient problem

Written by Rukshan Pramoditha