Batch Normalization: Simple Summary

0. Regular Normalization Techniques

Published in

Hyunjulie

2 min readOct 2, 2018

--

As “Preprocessing step”: Normalize/Standardize our data to get ready for training
Objective? Transform the data to put all the data on the same scale
Why do we do this? Data may have a relatively wide range (not on the same scale, Different features will have a different scale (e.g. when having age and height as features)
→ Will cause instability in neural networks (exploding gradient problem, cause slower training speed)
Examples: Normalization — Scaling numerical data down to 0~1, standardization — z = (data — mean)/standard deviation

1. What is batch normalization?

Normalization applied to the layer you choose to apply to within your network

Step 1: Normalizes output from activation function: z = (x-m)/s
Step 2: Multiply normalized output by an arbitrary parameter(g): z * g
Step 3: Add an arbitrary parameter (b) to the resulting product: (z * g) + b

Effect: Set a new standard deviation and mean for data
All four parameters (m, s, g, b) are trainable → become optimized during training

2. Why use batch normalization?

Makes sure that weights within the network do not become imbalanced with extreme values (reduce the ability of outlying large weights that will over-influence the training process)
Increase the training speed

3. When can we use it?

We can use the output from activation functions for individual layers within the model
Per-batch basis: where the name ‘batch-normalization’ came from

Example of when batch normalization is used

Machine Learning

심현주

Written by 심현주

Editor for

Hyunjulie

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams