Data Science 365

Bring data into actionable insights.

Member-only story

Batch Normalization Explained in Plain English

7 min readAug 30, 2022

--

Photo by Praveen kumar Mathivanan on Unsplash

In neural networks, getting input feature values into the same range will result in the following benefits.

Faster convergence and speed up training

After normalizing, the values lie in a very small range. So, the values are now small and homogeneous (have the same range). This makes learning (converging) faster for neural networks!

Before normalizing data, the cost (loss) function has a non-symmetric shape. But after normalizing, the cost function becomes more symmetric, which is much easier to converge because the gradient descent algorithm goes straight towards the minimum error faster rather than oscillating between the minimum and maximum error, which needs more time to converge.

Reduce the vanishing gradient problem

This problem may happen when there are many layers in the neural network. During training, layer weights are often multiplied by very small numbers less than 1. When this happens in many layers, the resulting outputs become even smaller so that, at some point, computer hardware cannot represent these small values anymore. So, the vanishing gradient problem arises.

--

--

Data Science 365
Data Science 365
Rukshan Pramoditha
Rukshan Pramoditha

Written by Rukshan Pramoditha

3,000,000+ Views | BSc in Stats (University of Colombo, Sri Lanka) | Top 50 Data Science, AI/ML Technical Writer on Medium

No responses yet