Activation Functions
Activation functions determine the accuracy of a deep learning model and also the computational efficiency of training a model. Activation functions also have a major effect on the neural network’s ability to converge, that is finding the optimal weights and biases. Without them, our neural network would become a linear combinations of linear functions. Different activation functions are employed of which ReLu, Sigmoid, Tanh and Softmax are extensively used.
Activation function decides, whether a neuron should be activated or not by calculating weighted sum and further adding bias with it. The purpose of the activation function is to introduce non-linearity into the output of a neuron.
Non-linear means that the output cannot be reproduced from a linear combination of the inputs (which is not the same as output that renders to a straight line — the word for this is affine).
ReLU (Rectified Linear Unit)
The Rectified Linear Activation Function, or ReLU activation function, is perhaps the most common function used for hidden layers.
It is common because it is both simple to implement and effective at overcoming the limitations of other previously popular activation functions, such as Sigmoid and Tanh.
Specifically, it is less susceptible to vanishing gradients that prevent deep models from being trained, although it can suffer from other problems like saturated or “dead” units.
The ReLU function is calculated as follows.
This means that if the input value (x) is negative, then a value 0.0 is returned, otherwise, the value is returned.
Dying ReLU
A “Dead” ReLU always outputs the same value for any input.
Leaky ReLU
Solves “dying ReLU” by introducing a negative slope below 0.
Parametric ReLU (PReLU)
ReLU, but the slope in the negative part is learned via back propagation.
Sigmoid
The Sigmoid activation function is also called the logistic function. It is the same function used in the logistic regression classification algorithm.
The function takes any real value as input and outputs values in the range 0 to 1. The larger the input (more positive), the closer the output value will be to 1.0, whereas the smaller the input (more negative), the closer the output will be to 0.0.
The sigmoid activation function is calculated as follows:
Where e is a mathematical constant, which is the base of the natural logarithm.
Tanh
The Hyperbolic Tangent activation function is also referred to simply as the Tanh (also “tanh” and “TanH“) function. It is very similar to the sigmoid activation function and even has the same S-shape.
The Tanh activation function is calculated as follows:
Where e is a mathematical constant, which is the base of the natural logarithm.
Softmax
The Softmax function outputs a vector of values that sum to 1.0 that can be interpreted as probabilities of class membership.
It is related to the argmax function that outputs a 0 for all options and 1 for the chosen option. Softmax is a “softer” version of argmax that allows a probability-like output of a winner-take-all function.
Here, x is a vector of outputs and e is a mathematical constant that is the base of the natural logarithm.
I hope this article provides you with a basic understanding of different Activation Functions.
If you have any questions or if you find anything misrepresented please let me know.
Thanks!