Different Types of Activation Functions :Sigmoid, tanh, ReLu & Leaky ReLu
Sigmoid
The sigmoid function range is [0,1]
Advantages
It is good to use for the problem of binary classification as it outputs a value between 0 and 1 (0&1 inclusive).
Drawbacks
It is some times not used in hidden layers because as the value of the x becomes large the steepness of the graph decreases, the gradient values become very small, this can slow the learning of our model.
tanh
Tanh function range is [-1,1]
Advantages
This is better than the sigmoid function as it is steeper than the sigmoid for small values which make learning faster as the gradient is large.
Drawbacks
As it ranges between -1 and 1 it cannot be used as the activation function of the last layer for binary classification as we need the binary output.
Moreover, just like the sigmoid function, the gradient is very less for the large values which again makes the learning slow.
ReLu
Advantages
Nowadays, this activation is mostly used as the gradient is significant and also it remains the same for large values.
Drawbacks
For negative values, the gradient drops to 0 which can make learning significantly slow. So, this should be used when most of the input values for a given layer are positive.
Leaky Relu
Advantages
This is a modified version of the ReLu function.
In this function for negative values, there is some slight slope that ensures that the gradient does not drop to zero, and the learning of the model is not being much affected.
Drawbacks
This is not as such a major drawback but sometimes the slope of function in the negative region needs to be fine-tuned.