Towards Weight Initialization in Deep Neural Networks

2 min readDec 8, 2016

--

This post will be more technical compared to the other two. In this post, I will try to analyze how different sets of weights influence training and accuracy. The main motivation is to explore how well are Glorot[1] weight initialization compared to other trivial options for initialization.

In this post, I will try to compare the following weight initializations:

Zero: When all weights are set to 0
Random: When weights are set completely randomly
Random between -1 to +1: Random weights on the scale of -1 to +1
Xavier-Glorot Initialization [1]

For the validation, I decided to use a custom convolutional neural network similar to VCG-Net and ran it on MNIST dataset.

Results are something as follows (surprising):

Validation Loss (Left) and Validation Accuracy (Right) for different sets of weights

Before drawing some conclusions, let’s also look at confusion matrices for all the 10 classes (digits 0 to 9).

**Confusion Matrices (From Top Left to Bottom Right): Random Weights in Range(-1 to 1), Glorot, Zero and Completely Random.**

As we can see, Random Weights between -1 to +1 works almost as good as Glorot (if not better) from the Val-Loss and Val-Accuracy graphs.
Same analysis can be drawn from the confusion matrices — here again Glorot and 1 classified the digits well.

A Jupyter notebook can be found on this github page.

If you would like to know more about me, please check my LinkedIn profile.

Towards Weight Initialization in Deep Neural Networks

A Jupyter notebook can be found on this github page.

Written by Amar Budhiraja