Batch Normalization | one minute summary

Batch norm has become the norm

Jeffrey Boschman
One Minute Machine Learning
1 min readJul 1, 2021

--

Image idea from https://deepai.org/machine-learning-glossary-and-terms/batch-normalization

This technique was introduced by the 2015 paper “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” by Ioffe and Szegedy (Google) and has become a staple regularization method for many models ever since.

  1. Why? For each training pass (e.g. each mini-batch), as the parameters of the preceding layers change, the distribution of inputs to the current layer changes accordingly, such that the current layer needs to continuously readjust to new input distributions. This problem is called internal covariate shift.
  2. What? Batch normalization is a regularization technique that standardizes the inputs to each layer, supposedly reducing internal covariate shift.
  3. How? For each training pass, the input to each layer (either before or after the activation function) is scaled to have a mean of 0 and standard deviation of 1, thereby standardizing the distribution of each layer input across all training passes and effectively allowing each layer to learn independently of the others to stabilize the learning process.

--

--

Jeffrey Boschman
One Minute Machine Learning

An endlessly curious grad student trying to build and share knowledge.