Towards Weight Initialization in Deep Neural Networks

Amar Budhiraja
2 min readDec 8, 2016

This post will be more technical compared to the other two. In this post, I will try to analyze how different sets of weights influence training and accuracy. The main motivation is to explore how well are Glorot[1] weight initialization compared to other trivial options for initialization.

In this post, I will try to compare the following weight initializations:

  1. Zero: When all weights are set to 0
  2. Random: When weights are set completely randomly
  3. Random between -1 to +1: Random weights on the scale of -1 to +1
  4. Xavier-Glorot Initialization [1]

For the validation, I decided to use a custom convolutional neural network similar to VCG-Net and ran it on MNIST dataset.

Results are something as follows (surprising):

Validation Loss (Left) and Validation Accuracy (Right) for different sets of weights

Before drawing some conclusions, let’s also look at confusion matrices for all the 10 classes (digits 0 to 9).

Confusion Matrices (From Top Left to Bottom Right): Random Weights in Range(-1 to 1), Glorot, Zero and Completely Random.

As we can see, Random Weights between -1 to +1 works almost as good as Glorot (if not better) from the Val-Loss and Val-Accuracy graphs.
Same analysis can be drawn from the confusion matrices — here again Glorot and 1 classified the digits well.

A Jupyter notebook can be found on this github page.

If you would like to know more about me, please check my LinkedIn profile.

--

--