Batch Normalization

Romain Bouges
unpack
Published in
2 min readJan 4, 2021

Batch Normalization

An illustration of Batch Norm [6].

What is Batch Normalization ?

Batch Normalization (BN) is the adjustment of both the mean (shifting) and the variance (scaling) of a given set of data to specific values. Those will be chosen according to previous input data used to train a given model.

Applying this process to the input data of each layer of the deep neural network and not only to the first one is what makes BN so efficient and applied to so many areas of deep learning today. Here in another popular [1] type of promising normalization technique called Layer Normalization [2], to get some perspective.

What makes this technique useful and under which circumstances ?

Normalization parameters used to train a pretrained model should be known to normalize a new set of data and fully benefit of the model experience. Not doing so might result in loosing the power of the previously trained model [3]

The increased speed is another reason why this input data manipulation is so useful. Thanks to higher learning rates, less iterations will be performed and thus a faster convergence of the network will be achieved [4]. The deeper the network the lower the learning rates, the more valuable it will be when many layers are involved [5].

The noise added by normalization provides a regularization effect. Combined with drop out, better results regarding over-fitting will be achieved and a lower drop out rate will be eventually used. Both techniques used at the same proves to give the best results [6].

On the other hand a small quantity of input data tends to make the normalization less meaningful as well as heterogeneous batch sizes [7].

[1] https://paperswithcode.com/methods/category/normalization

[2] https://paperswithcode.com/method/layer-normalization

[3] Deep Learning for Coders with Fastai and Pytorch: AI Applications Without a PhD, Howard, J. and Gugger, S., Chapter 1

[4] https://arxiv.org/pdf/1502.03167.pdf, Batch Normalization enables higher learning rates, p5

[5] Deep Learning for Coders with Fastai and Pytorch: AI Applications Without a PhD, Howard, J. and Gugger, S., Chapter 7

[6] https://theaisummer.com/normalization/#batch-normalization-2015

[7] https://medium.com/@darrenyaoyao.huang/why-we-need-normalization-in-deep-learning-from-batch-normalization-to-group-normalization-d06ea0e59c17

--

--