Improving Prediction Rates with Softmax | Towards AI

The Problem with Softmax Activation Function

How to avoid incorrect prediction by Softmax

Gagandeep Singh
Oct 10 · 3 min read

Softmax is by far the most common activation function used. It is usually used in classification of tasks in the output layer of the Neural Network. What softmax actually does is it turns output (logits) into probabilities.

Now, before we jump into the main problem with softmax, let’s first discuss how it actually works and also understand what are logits!

Logits- These are last raw scores that are predicted by the last layer of the neural network. These are the values that we get before any activation function is applied to them.

So, let's take this example-

import package

2. Create a python list with assuming these are the outputs.

logits array

3. Take exponent of every value in logits array.

exponent on logits

4. Apply softmax function

softmax operation

If we decode the above concept into a formula then it will simply look like this.

Now, let’s talk about the main issue.

Softmax converts all logits into probabilities and the sum of all probabilities will always be zero. This means (in case of image classification) even if the image doesn’t belong, it will still give a result.

Let’s understand this better by the example of Image Classification.

You have trained your neural network to classify MNIST image. What if you accidentally gave it an image which does not belong to MNIST? In such cases, the neural network should give an output that the image doesn’t belong to MNIST but the output layer of our NN has 10 classes which means no matter what image we give it, it will try to classify it. For example — if you give it an image of elephant, it will try to classify it to the nearest class in MNIST.

What issues it can cause?

Imagine you deployed a CNN classifier for 100 classes(say). During tests, it gave a good result and you deploy it. After some time you start getting complaint that it is misclassifying many images. Upon looking you find out the classes that it is classifying are not supported.

What are the solutions?

I’m not an expert but I think adding an extra class with random images will help. If you don’t want to add new class then probably checking the score(confidence) after softmax is applied might help i.e. dropping values less than certain threshold. There might be even other methods to evaluate output.

Conclusion

Next time you are doing an image classification task think of the case when the image doesn’t belong to any of the class and you’ll make your model ready for real-world because it’s tough out there.

Towards AI

Towards AI, is the world’s fastest-growing AI community for learning, programming, building and implementing AI.

Gagandeep Singh

Written by

Data Scientist at Zykrr. Geeky — https://www.linkedin.com/in/gaganmanku96/

Towards AI

Towards AI, is the world’s fastest-growing AI community for learning, programming, building and implementing AI.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade