Softmax is by far the most common activation function used. It is usually used in classification of tasks in the output layer of the Neural Network. What softmax actually does is it turns output (logits) into probabilities.
Now, before we jump into the main problem with softmax, let’s first discuss how it actually works and also understand what are logits!
Logits- These are last raw scores that are predicted by the last layer of the neural network. These are the values that we get before any activation function is applied to them.
So, let's take this example-
- Import numpy or math to perform exp operation
2. Create a python list with assuming these are the outputs.
3. Take exponent of every value in logits array.
4. Apply softmax function
If we decode the above concept into a formula then it will simply look like this.
Now, let’s talk about the main issue.
Softmax converts all logits into probabilities and the sum of all probabilities will always be zero. This means (in case of image classification) even if the image doesn’t belong, it will still give a result.
Let’s understand this better by the example of Image Classification.
You have trained your neural network to classify MNIST image. What if you accidentally gave it an image which does not belong to MNIST? In such cases, the neural network should give an output that the image doesn’t belong to MNIST but the output layer of our NN has 10 classes which means no matter what image we give it, it will try to classify it. For example — if you give it an image of elephant, it will try to classify it to the nearest class in MNIST.
What issues it can cause?
Imagine you deployed a CNN classifier for 100 classes(say). During tests, it gave a good result and you deploy it. After some time you start getting complaint that it is misclassifying many images. Upon looking you find out the classes that it is classifying are not supported.
What are the solutions?
I’m not an expert but I think adding an extra class with random images will help. If you don’t want to add new class then probably checking the score(confidence) after softmax is applied might help i.e. dropping values less than certain threshold. There might be even other methods to evaluate output.
Next time you are doing an image classification task think of the case when the image doesn’t belong to any of the class and you’ll make your model ready for real-world because it’s tough out there.