Loss Functions — Multiclass SVM Loss and Cross Entropy Loss
Why Loss function is more important in Machine Learning Applications? In my last article we discussed about parameterized learning, such types of learnings will take some data as input and class labels. Actually, a function will be learnt to map the input to predicted class labels by defining few parameters like weights and biases.
First will see how a loss curve will look a like and understand a bit before getting into SVM and Cross Entropy loss functions.
Above graph, is a loss curve for two different models trained on CIFAR-10 dataset. And what we can observe from these two lines on graph?
- Model 1 (Red Line)— As the model trains longer the loss drops
- Model 2 (Blue Line) — Loss starts decreasing initially but after 10 epochs, it stops decreasing.
The better classifier model will have lower loss. (But not always “Overfitting” is another concept, which we can discuss later).
I have been speaking above loss some may understand some may not, but for those who haven't come across this term. Loss is a score calculated based on your true class labels and predicted class labels. Say you have trained a model to classify dog and cat image and model will predict your input image with a probability of 0.97 as cat and 0.03 as cat, the difference between your true label and predicted probability is nothing but a loss.
Basically the model will be trained based on your loss and update/optimize the parameters (Weights and biases) to predict the data better.
Multi Class SVM Loss
Multi-class SVM Loss (as the name suggests) is inspired by (Linear) Support Vector Machines (SVMs), which uses a scoring function f to map our data points to numerical scores for each class labels.
Lets try to understand Multiclass SVM with an example.
In this example, you have three images Dog, Cat and Panda and each image is classified out of three classes and we have three values for each image.
Formula: Loss = max(0,predicted-original+1)
For first image, True label is dog, and its predicted with the value of 4.26 as dog, 1.33 as cat and -1.01 as Panda.
image_1 = max(0, 1.33–4.26 + 1) + max(0, -1.01–4.26 + 1)
Similarly calculate the loss for all the images and add the calculated value for all input images. The average of loss values is nothing but a SVM loss (hinge Loss)
image_2 =max(0, 3.76 — (-1.20) + 1) + max(0, -3.81 — (-1.20) + 1)image_3 =max(0, -2.37 — (-2.27) + 1) + max(0, 1.03 — (-2.27) + 1)loss = (image_1 + image_2 + image_3) / 3.0
Cross- Entropy Loss
Our goal here is to classify our input image(Panda) as Dog, Cat or Panda. This involves three steps.
Step 1 — We will get the scoring value for each of the three classes as we got in Multiclass SVM based on the used function.
Step 2 — Exponentiate the result of Scoring function to obtain unnormalized Probabilities
Step 3 — Calculate the sum of Unnormalized Probabilities and divide each value by sum to obtain Normalized Probabilities (Softmax Layer)
Step 4 — From step 3 we actually got which class the input image belongs, but to assess how good our value is we need to apply negative natural logarithm on normalized probability.
We can then repeat this process for all images in our training set, take the average, and obtain the overall cross-entropy loss for the training set. This process allows us to quantify how good or bad a set of parameters are performing on our training set.
This will give a clarity on how multiclass and cross entropy losses are calculated with a practical example. I hope this helps.
Keep Reading!!!
Reference:
Deep Learning for Computer Vision with Python by Adrian Rosebrock (Starter Bundle)