Image for post
Image for post

Deep Learning Best Practices: Activation Functions & Weight Initialization Methods — Part 1

Niranjan Kumar
May 4, 2019 · 17 min read
Image for post
Image for post
Photo by Franck V. on Unsplash
Image for post
Image for post
Image Source: https://padhai.onefourthlabs.in

Why are Activation Functions Important?

Image for post
Image for post
Simple Neural Network
Image for post
Image for post
Pre-activation Function

What happens if there are no non-linear activation functions in the network?

Image for post
Image for post
Linear Transformation

Logistic Function

Image for post
Image for post
Logistic Function
Image for post
Image for post
Logistic Function
Image for post
Image for post
Image for post
Image for post
Thin but Deep Network
Image for post
Image for post
Pre and Post Activation
Image for post
Image for post
Chain Rule for the derivative

Saturated neurons cause the gradients to vanish

Image for post
Image for post
Simple Aggregation of Inputs

Zero centered functions

Logistic function is not zero-centered

Image for post
Image for post
Image Source: https://padhai.onefourthlabs.in
Image for post
Image for post
Gradients for Weights
Image for post
Image for post
Weighted Sum
Image for post
Image for post
Modified Chain Rule Gradients
Image for post
Image for post
Gradient Options

Computationally expensive

Tanh Function

Image for post
Image for post
Image for post
Image for post
Mathematical Form

ReLU — Rectified Linear Unit

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Leaky ReLU

Image for post
Image for post
Image for post
Image for post

Weight Initialization

Image for post
Image for post
Xavier Initialization Not this Xavier

Why not initialize all weights to zero?

Image for post
Image for post
Simple Network
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Random Initializing — Small Weights

Image for post
Image for post
Image for post
Image for post
Activation output for 5 layers (1 to 5)
Image for post
Image for post
Vanishing Gradient — Sigmoid Function

Random Initializing — Large Weights

Image for post
Image for post
Image for post
Image for post
Post-Activation

Xavier initialization

Image for post
Image for post
Image for post
Image for post

He (He-et-al) Initialization

Image for post
Image for post
Variance Formula
Image for post
Image for post

Best Practices

Conclusion

Way Forward



Data Driven Investor

from confusion to clarity not insanity

Sign up for DDI Highlights

By Data Driven Investor

Our Editor's Selection  Take a look

Create a free Medium account to get DDI Highlights in your inbox.

Niranjan Kumar

Written by

Senior Consultant Data Science|| Freelancer. Writer @ TDataScience & Hackernoon|| connect & fork @ Niranjankumar-c

Data Driven Investor

from confusion to clarity not insanity

Niranjan Kumar

Written by

Senior Consultant Data Science|| Freelancer. Writer @ TDataScience & Hackernoon|| connect & fork @ Niranjankumar-c

Data Driven Investor

from confusion to clarity not insanity

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store