Analysis of Softmax Function
Softmax is the most popular activation function used for multi-class classification. Most of the people use libraries for implementing multi-class classification, so they do not encounter the problem discussed below. This article for those who want to understand how overflow and underflow condition occur in softmax ,why softmax needs to be stabilised and things to be taken care of while coding your own multi-class classifier.
Mathematically softmax function is represented as-
The function outputs the probability of occurrence of element x(i), summation of probability of each x(i) is 1.
The problem arise when x(i) is too small or too large. Suppose each x(i) is very small negative number, exp(x(i)) will be close to 0, since all the x(i) are very small the denominator of softmax function will be close to 0 and result will be not defined. This is called underflow. If x(i) is very large exp(x(i)) will be very large number, may exceed the computational limit. This is called overflow.
To solve this problem we will subtract max(x(i)) from all x(i) below image proves that this does not have any effect on our output.
The problem of overflow and underflow is solved by this technique as when we subtract max(x(i)) the max value in array x becomes 0 and rest values are less than 0, so there is no overflow. Since now there will be a element with value 0, exp(0)=1 imply that at least one element in the denominator is 1 so there is no underflow. This is how softmax is stabilised.