Introduction to Exponential Linear Unit

2 min readApr 12, 2018

This blog post will be introduction to understanding Exponential Linear Unit (ELUs) activation function in Deep Learning. Best way to learn any concept is to code. I also recently submitted a PR to add ELU activation function in Apache System-ML.

Fancy Activation functions in Deep Learning

The most popular activation function arguably are ReLUs. They have non-negative activation hence their mean activation is greater than zero. According to the author of this paper, this problem causes bias shift for units in next layer. Most machine learning algorithms work better with zero-centered, normalized features.

Since ELUs can have negative values it pushes the mean of the activations closer to zero. Having mean activations closer to zero also causes the faster learning and convergence.

Results

Classification performance with various popular models

The ELU-Network perform the second best on CIFAR-10 with a test error of 6.55% and performed best on CIFAR-100 with a test error of 24.28%.

Experimental results also show that ELUs significantly outperform other activation functions on different vision datasets.

The Math

The math is quite simple. The equation can be seen below

Alpha is a constant value initialized to 1

The only thing that you need to know is that the derivative of the exponent function is the exponent it self.

F(x) is the forward pass and its derivative is F`(x) for calculating its backward gradients.

Thanks for reading this post :). If you want to contribute to Apache SystemML please feel free to work on any JIRA issues or reach out to me if you need help.

Contact me on Twitter @krishnakalyan3.

Introduction to Exponential Linear Unit

Results

The Math

Written by Krishna