GELU activation

2 min readJul 21, 2019

GELUs full form is GAUSSIAN ERROR LINEAR UNIT

Activations like ReLU, ELU and PReLU have enabled faster and better convergence of Neural Networks than sigmoids.

Also, Dropout regularizes the model by randomly multiplying a few activations by 0.

Both of the above methods together decide a neuron’s output. Yet, the two work independently from each other. GELU aims to combine them.

Also, a new RNN regularizer called Zoneout stochastically multiplies the input by 1.

We want to merge all 3 functionalities by stochastically multiplying the input by 0 or 1 and getting the output value (of the activation function) deterministically.

We chose this distribution since neuron’s input follow a normal distribution, especially after Batch Normalization.

But the output of any activation function should be deterministic, not stochastic. So, we find the expected value of our transformation.

Since Φ(x) is a cumulative distribution of Gaussian distribution and is often computed with the error function, hence we define Gaussian Error Linear Unit (GELU) as-

References-

GELU paper- https://arxiv.org/pdf/1606.08415v3.pdf

GELU activation

Written by Shaurya Goel