Deep Learning Best Practices (1) — Weight Initialization

Neerja Doshi
USF-Data Science
Published in
7 min readMar 26, 2018


Basics, weight initialization pitfalls & best practices


As a beginner at deep learning, one of the things I realized is that there isn’t much online documentation that covers all the deep learning tricks in one place. There are lots of small best practices, ranging from simple tricks like initializing weights, regularization to slightly complex techniques like cyclic learning rates that can make training and debugging neural nets easier and efficient. This inspired me to write this series of blogs where I will cover as many nuances as I can to make implementing deep learning simpler for you.

While writing this blog, the assumption is that you have a basic idea of how neural networks are trained. An understanding of weights, biases, hidden layers, activations and activation functions will make the content clearer. I would recommend this course if you wish to build a basic foundation of deep learning.

Note — Whenever I refer to layers of a neural network, it implies the layers of a simple neural network, i.e. the fully connected layers. Of course some of the methods I talk about apply to convolutional and recurrent neural networks as well. In this blog I am going to talk about the issues related to initialization of weight matrices and ways to mitigate…