Towards Weight Initialization in Deep Neural Networks
This post will be more technical compared to the other two. In this post, I will try to analyze how different sets of weights influence training and accuracy. The main motivation is to explore how well are Glorot[1] weight initialization compared to other trivial options for initialization.
In this post, I will try to compare the following weight initializations:
- Zero: When all weights are set to 0
- Random: When weights are set completely randomly
- Random between -1 to +1: Random weights on the scale of -1 to +1
- Xavier-Glorot Initialization [1]
For the validation, I decided to use a custom convolutional neural network similar to VCG-Net and ran it on MNIST dataset.
Results are something as follows (surprising):
Before drawing some conclusions, let’s also look at confusion matrices for all the 10 classes (digits 0 to 9).
As we can see, Random Weights between -1 to +1 works almost as good as Glorot (if not better) from the Val-Loss and Val-Accuracy graphs.
Same analysis can be drawn from the confusion matrices — here again Glorot and 1 classified the digits well.
A Jupyter notebook can be found on this github page.
If you would like to know more about me, please check my LinkedIn profile.