How weights are initialized in Neural networks (Quick Revision)

Navaneeth Sharma
Analytics Vidhya
Published in
3 min readOct 4, 2021

The Key factor which changed the trajectory of Deep Learning

Photo by 🇸🇮 Janko Ferlič on Unsplash

The neural network is one of the fundamental concepts for modern Deep learning. This article covers Only the How and Why Initialization of the Neural network, So I assume you know the basic understanding of neural networks (Specially Forward Propagation and Back Propagation Concepts ).

So, What is Initialization?

Initialization is a method of giving initial weights for the neurons, These are the techniques through which we can assign random values instead of zeros or constant values before training the model.

Why do we need it?

What is the point of using it when we don’t know why we are using it? There are three main reasons why it has been used.

  1. One of the reasons is that it can avoid Vanishing, exploding gradients problem in the late part of training. Vanishing gradients problems refer to the problem which occurs when weights don’t change during the training (occurs when weights are initialized very much less than 1). Exploding gradients problem refers to the problem which occurs when weights keep on increasing rapidly (occurs when weights are initialized greater than 1)
  2. Another reason is it can speed up the training process as it can learn faster with optimal initialization.

What are the types of it? How/When to use it?

There are mainly three types of Weight Initialization, which works pretty well for practical use-cases.

  1. Uniform Distribution
  2. Xavier/Gorat Distribution
  3. He Initialization
Fig 1 — Diagram of One Neuron, fan_in refers to the number of input layers and fan_out refers to number of output layers

Uniform Distribution

Is a type of Initialization, where the weights are uniformly distributed. The formula for calculating the initial weight range is given below

Xavier / Gorat Initialization

This initialization has proven very good for the neural network that uses Sigmoid, Tanh as the activation function. There are broadly two varieties,

Xavier Normal

The values for the weights are assigned by a normal distribution having zero mean and the standard deviation of

Xavier Uniform

The values for the weights are assigned by a uniform distribution having the formula

He Initialization

This initialization has proven good for the neural network that uses ReLU as the activation function. There are broadly two varieties,

He Uniform

The values for the weights are assigned by a uniform distribution having the formula

He Normal

The values for the weights are assigned by a normal distribution having zero mean and the standard deviation of

Yeah! That’s it, We have learned/revised quickly to recapitulate the formulas and Concepts behind the Initialization for Neural Networks. Also if you want to know about this in detail I recommend to you to read this and you can visualize the concepts here.
Thank you for reading!

See you next time…

--

--