Batch Normalization | one minute summary
Batch norm has become the norm
Published in
1 min readJul 1, 2021
This technique was introduced by the 2015 paper “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” by Ioffe and Szegedy (Google) and has become a staple regularization method for many models ever since.
- Why? For each training pass (e.g. each mini-batch), as the parameters of the preceding layers change, the distribution of inputs to the current layer changes accordingly, such that the current layer needs to continuously readjust to new input distributions. This problem is called internal covariate shift.
- What? Batch normalization is a regularization technique that standardizes the inputs to each layer, supposedly reducing internal covariate shift.
- How? For each training pass, the input to each layer (either before or after the activation function) is scaled to have a mean of 0 and standard deviation of 1, thereby standardizing the distribution of each layer input across all training passes and effectively allowing each layer to learn independently of the others to stabilize the learning process.