Batch Normalization Explained in Plain English

Theory and implementation in Keras

Rukshan Pramoditha
Data Science 365

--

Photo by Praveen kumar Mathivanan on Unsplash

In neural networks, getting input feature values into the same range will result in the following benefits.

Faster convergence and speed up training

After normalizing, the values lie in a very small range. So, the values are now small and homogeneous (have the same range). This makes learning (converging) faster for neural networks!

Before normalizing data, the cost (loss) function has a non-symmetric shape. But after normalizing, the cost function becomes more symmetric, which is much easier to converge because the gradient descent algorithm goes straight towards the minimum error faster rather than oscillating between the minimum and maximum error, which needs more time to converge.

Reduce the vanishing gradient problem

This problem may happen when there are many layers in the neural network. During training, layer weights are often multiplied by very small numbers less than 1. When this happens in many layers, the resulting outputs become even smaller so that, at some point, computer hardware cannot represent these small values anymore. So, the vanishing gradient problem arises.

--

--

Rukshan Pramoditha
Data Science 365

3,000,000+ Views | BSc in Stats | Top 50 Data Science, AI/ML Technical Writer on Medium | Data Science Masterclass: https://datasciencemasterclass.substack.com