Softmax Function Beyond the Basics

Published in

Data Science Bootcamp

9 min readMay 11, 2019

Welcome! If you are learning about the Softmax function for the first time please read our beginner friendly article Understand Softmax in Minutes. If you are a machine learning professional, a data scientist, likely you will want to learn more about Softmax — Softmax beyond the basics, Softmax use cases, Softmax “in the wild”. This article covers the stuff experts say, beyond the basics.

How should you use this article? Read our disclaimer here. Should not be used for production or implementation or commercial purpose. Just for your personal reading.

Softmax Formula | Softmax Activation Function

Softmax is used as the activation function for multi-class classification tasks, usually the last layer. We talked about its role transforming numbers (aka logits) into probabilities that sum to one. Let’s not forget it is also an activation function which means it helps our model achieve non-linearity. Linear combinations of linear combinations will always be linear but adding activation function helps gives our model ability to handle non-linear data.

Output of other activation functions such as sigmoid does not necessarily sum to one. Having outputs summing to one makes softmax function great for probability analysis.

The important assumption is that the true class labels are independent. That is to say each sample of data can only belong to one class. For example, an image cannot be both a cat and a dog at the same time…

Softmax Function Beyond the Basics

Softmax Formula | Softmax Activation Function

Written by Uniqtech