Softmax function Explained Clearly and in Depth |Deep Learning fundamental

Suetsugu
6 min readJul 24, 2022

--

The softmax function is widely used in deep learning models. However, in many cases, papers and summary sites just say “softmax” as in “softmax the result of ~,” but there is no explanation of softmax.
(Softmax is a crucial function widely used outside of natural languages.)

‘What is softmax? What does it do?’ To solve your questions, we will explain the softmax function in an easy-to-understand manner!

Chapter 1: Explaining SOFTMAX Functions in an Easy-to-Understand Way with Illustrations

In this chapter, you will learn “What is the Softmax function? How it works and why it is used” and “Situations in which it is used in actual models of natural language” will be explained in an easy-to-understand manner using illustrations!

1.1 What is the SOFTMAX function?

How it works and why it is used.
How the softmax function works in one sentence, the softmax function is “a function that converts input values to 0–1 values that sum up to 1.’ This alone may not be easy to understand, so let’s see how the numbers are converted.

As shown in the illustration, let us consider the case where [5, 4, -1] is the input value. The softmax function converts the input value to an output value of “0–1 values, summing to 1”. In this case, we see that the input value [5, 4, -1] is converted to [0.730, 0.268, 0.002]. We can see that the sum has been converted to a probability value of 1, 0–1.

1.2 Why the softmax function is used

Next, we will explain why softmax is used. Why do we need to “convert to a 0–1 value such that the sum is 1”?

Simply put, the reason is that it is difficult for humans to understand the AI output values as they are, and it is also difficult to process them afterward.

For example, consider the task of translating the Spanish word “gato” into English. Gato could be “cat,” “servant,” “person from Madrid,” and so on.
The AI will give a probability value for each of these possible answers. However, the problem is that the output values are difficult to understand. Let’s look at an illustration.

As shown in the illustration, AI produces output values such as [cat: 5, servant: 4, a person from Madrid: -1].

If softmax is not used, it is difficult for humans to understand “how much is the probability?”. If someone says, “The probability of the English answer is 5 for the cat and -1 for a person from Madrid!” you would be confused, wouldn’t you?

On the other hand, if you use softmax, it is converted to “a value of 0–1 such that the sum is 1,” which is very intuitive and easy to understand. As shown in the illustration, it is very easy to understand if you are told, “The probability of a cat is 73%, and the probability of a woman listening to innuendos is 0.2%”.

In other words, softmax is like an “interpreter between AI and people. " Softmax functions are mostly used in the “final layer,” where the output values produced by AI are finally converted.

If you hear someone say “softmax the result of ~” in a paper or on a summary website, understand that “you are converting the result of ~ into a probability value that is easy to understand.

1.3 Scenarios in which Softmax is used in actual models of natural languages

Softmax is used in many situations as the “final layer of a classification problem. In fact, it is also used as the final layer in RNNLM and Attention. Specifically, RNNLM is used to “predict the next word,” and Attention is used for “translation tasks,” to generate the final probability value for each task.

RNNLM and Attention are explained in depth in the following pages. Both models are essential in the history of natural language, so if you are interested, please read on.

Attention in Natural Language Processing (NLP) is explained clearly and in-depth.

RNNLM in Natural Language Processing (NLP) is explained in depth and in an easy-to-understand manner

Summary of Chapter 1

  • The softmax function converts input values to 0–1 values that sum to 1
  • When someone says “softmax the result of ~,” you should understand it as “convert the result of ~ to an easy-to-understand probability.”

Chapter 2: Explaining SOFTMAX in Depth with Formulas

At this point, you should have an overview of the softmax function. From here on, we will explain the softmax function using mathematical formulas. Of course, this chapter will be a bit difficult to follow, but we will explain it as simply as possible, so no special mathematical knowledge is required.

If you can read and understand this chapter thoroughly, you should be able to understand and explain the formulas of the softmax function.

Overall Process Flow

First, we will explain the overall process flow. I will explain it using a more detailed diagram version at the beginning of this section.

As shown in the figure, the softmax function takes an input value and converts it to an output value in two steps.

The exponential function (e) is applied, and everything is converted to positive numbers. (In the example, you can see that -1 is converted to 0.4, a positive value.)
Divide by the total. Divide the number produced by (e) by the sum of the numbers in (e). This results in a “total of 1 value.
Following these two simple steps, we have “converted the input value to a 0–1 value whose sum is 1. This is the entire process of softmax functions.

Detailed Explanation with Formulas

Now that you understand the flow let’s also look at the formulas. It may have seemed not accessible at the beginning, but once you understand the flow of the process, it should seem like a simple formula.

You will see that it is a very simple formula, take the exponent of x, set it to ex, and divide it by the total of the whole.

Summary of Chapter 2

  • Softmax is a simple system of (1) taking an exponent and (2) dividing by the total.
  • The formula is also straightforward if you understand the flow of the process.

Summary

Chapter 1

  • The softmax function converts the input value to a value between 0 and 1, where the sum is 1.
  • When someone says, “softmax the result of ~,” it should be understood as “convert the result of ~ into an easy-to-understand probability.

Chapter 2

  • softmax is a simple mechanism that (1) takes an exponent and (2) divides it by the sum.
  • The formula is also very simple if you understand the flow of the process.

Blog URL

Blog top

--

--

Suetsugu

TUM Data Science, I write about NLP and DeepLearning in the graphic. BLog URL:http://nlpillustration.tech/ #FollowBack100%