Stuck behind the paywall? Click here to read the full story with my friend link!

This article is actually a continuum of a series that focuses on the basic understanding of the building blocks of Deep Learning. Some of the previous articles are, in case you need to catch up:

Machine Learning is not just classifying whether an image contains a dog or not. Neither is it just for predicting house predictions of Boston. When one gets into how many applications there really are of this vast industry, one usually gets stunned!

Back to the point, it’s not compulsory that a model can predict just one probability when given a scenario, for example, what if you want to check if an image contains a dog **and a cat?** You see? This is where we have multi-class classification.

# Multi-class Classification

Multi-class classificationis those tasks where examples are assigned exactly one of more than twoclasses. BinaryClassification:Classificationtasks with twoclasses.Multi-class Classification:Classificationtasks with more than twoclasses.[1]

## Softmax Regression

We have seen many examples of how to classify between two classes, i.e. Binary Classification. Now, we will discuss what to do if we want more than two classes classified.

Suppose we have four classes we want classify among. Then, our model’s last layer must have four nodes, each would be responsible of giving out the probability of that instance being true.

Here is a labelled version.

So, the output layer will be of dimension (4, 1) because it’ll be giving out the probability of four instances. Also, the sum of the probabilities must sum to 1.

## Softmax Layer

To get the probabilities amount different classes, we change the activation function.

The Softmax function is different from the activation functions we’ve been studying, the softmax is basically somewhat the probability of a classes out of all the classes.

## Implementation

T = e^Z[l] # where Z[l] is the activation of the last layera[l] = (e^Z[l]) / (Sum(t[I]))a[l][I] = t[I] / (Sum(t[I]))

The main difference between the Softmax and other Activation functions is that the other Activation functions take in a number and output a number but Softmax here, takes in a list of number and returns an array as well.

Softmax regression generalizes logistic regression to N classes.

And if N == 2:

Softmax regression essentials reduces to logistic regression.

## Loss Function

Suppose this is the **y** = [0, 1, 0, 0] and **y`** is [0.1, 0.2, 0.4, 0.3].

So, the loss function is:

`L(y, y`) = -Sum(y * log(y`))`

## Cost Function

Cost function will change to:

`J(hyper parameters) = (1 / m) * Sum(L(y`, y))`

# Conclusion

In this article, we have just stepped into what it looks like to make our models make more predictions than just a single yes or no. We will continue this discussion with more concepts the next time! Follow for updates!

# Further Readings

# Contacts

If you want to keep updated with my latest articles and projects, follow me on Medium. These are some of my contacts details:

Happy Learning. :)