Batch Normalization: Simple Summary

0. Regular Normalization Techniques

심현주
Hyunjulie
2 min readOct 2, 2018

--

  • As “Preprocessing step”: Normalize/Standardize our data to get ready for training
  • Objective? Transform the data to put all the data on the same scale
    Why do we do this? Data may have a relatively wide range (not on the same scale, Different features will have a different scale (e.g. when having age and height as features)
    → Will cause instability in neural networks (exploding gradient problem, cause slower training speed)
  • Examples: Normalization — Scaling numerical data down to 0~1, standardization — z = (data — mean)/standard deviation

1. What is batch normalization?

Normalization applied to the layer you choose to apply to within your network

  • Step 1: Normalizes output from activation function: z = (x-m)/s
  • Step 2: Multiply normalized output by an arbitrary parameter(g): z * g
  • Step 3: Add an arbitrary parameter (b) to the resulting product: (z * g) + b

Effect: Set a new standard deviation and mean for data
All four parameters (m, s, g, b) are trainable → become optimized during training

2. Why use batch normalization?

  • Makes sure that weights within the network do not become imbalanced with extreme values (reduce the ability of outlying large weights that will over-influence the training process)
  • Increase the training speed

3. When can we use it?

  • We can use the output from activation functions for individual layers within the model
  • Per-batch basis: where the name ‘batch-normalization’ came from
Example of when batch normalization is used

--

--