Activation Functions in NN
Activation functions are used to fit the output in a certain range . Different functions fix it in different ranges . It can be from [0,1] or [-1 ,1] . In simple terms it is used to “activate” the input after calculation with weights and biases to a certain output to a value in the range.
There is Linear and Non-linear activation functions:
-> Linear Activation function (Identity function)
Also known as “No Activation” . The activation is proportional to the input. The function doesn’t do anything to the weighted sum of the input. It just spits out the value it is given.
-> Non Linear Activation Functions :
1) Sigmoid function
This function looks like an S shaped curve.The range is [0,1].
If the network or the model has to predict something , i.e the output is a Yes or No (1 or 0) we use the Sigmoid activation function . The sigmoid function is differentiable , which means we can find the slope of the curve from any 2 points.
2) Tanh activation function ( hyperbolic tangent)
This is also an S shaped function and very similar to the Sigmoid function but the range is [-1,1].
Used in cases where you need 3 cases of output negative , neutral and positive : -1,0,1
3) ReLU
The rectified linear unit activation is pretty straightforward . For every x>0 f(x) = x and for every x ≤ 0 f(x) = 0 . The differential f’(x) will return 1 and 0. It is differentiable for every point except at x = 0.
4) Leaky ReLU
It is an improved version of ReLU.
In ReLU for all values of x < 0 f(x) = 0 .
In leaky ReLU for x<0 f(x) = 0.1 * x .
This is done so that for all values less than 0 it recieves a really small output as well rather than it just being 0. (addressing the disadvatage in normal ReLU).
5) Softmax activation function
It is similar to a logistic function like the sigmoid function. It is basically a combination of multiple sigmoids. It is usually used in the final layer .
It is used for multiclass classification — as it allows the output to be a probability distribution for multiple classes