Playing with Loss Functions in Deep Learning

Michael Avendi
How to AI
Published in
5 min readMay 1, 2018

In this post, we are going to be developing custom loss functions in deep learning applications such as semantic segmentation. We use Python 2.7 and Keras 2.x for implementation.

Standard Loss Function

Loss Functions are at the heart of any learning-based algorithm. We convert the learning problem into an optimization problem, define a loss function and then optimize the algorithm to minimize the loss function.

Source: Deep Learning with Python, François Chollet

Consider a semantic segmentation of C objects. This means that there are C objects in the image that need to be segmented. We are given a set of images and corresponding annotations for training and developing the algorithm. For simplicity, let us assume that there are C=3 objects including an ellipse, a rectangle, and a circle. We can use a simple code such as below to generate some masks with three objects.

Typical ground truth masks for the objects would look like below:

Typical ground truth for objects.

Also assume that we develop a deep learning model, which predicts the following outputs:

Typical predictions

First, we are going to use the standard loss function for semantic segmentation, i.e., the categorical cross-entropy as written below:

Standard categorical cross entropy

Here C is the number of objects, y_i is the ground truth and p_i is the prediction probability per pixel. Also, y_i is one if the pixel belongs to class i and zero otherwise. Note that i=0 corresponds to the background. The loss will be calculated for all pixels in each image and for all images in the batch. The average of all these values will be a single scalar value reported as the loss value. In the case of categorical cross entropy, the ideal loss will be zero!

To be able to easily debug and compare results, we develop two loss functions, one using Numpy as:

And its equivalent using tensor functions of the Keras backends as:

As you can see, there is not much difference between the two loss functions except using the backend versus numpy. If we try 8 random annotations and predictions, we obtain loss_numpy=0.256108245968 and loss_tensor=0.256108, from the numpy and the tensor functions, respectively. Practically, the same values!

Custom Loss Function

Now we are going to develop our own custom loss function. This customization may be needed due to issues in the quality of data and annotations. Let us see a concrete example.

In our case study, let us assume that for some reason there is missing ground truth. For instance, in the below figure there is no ground truth for object 3 (circle) while the deep learning model provides a prediction.

Ground truth is missing for the circle.
Prediction output

In another example, the ground truth is missing for the first object (ellipse):

Ground truth is missing for the ellipse.
Prediction outputs.

In these scenarios, if we still use the standard loss functions, we may be penalizing the AI model incorrectly. The reason is that the pixels belong to the missing ground truth will be considered as the background and multiplied by -log(p_i), where p_i is the small prediction probability and as a result -log(p_i) is going to be a large number. Note that this is based on our assumption that there should be a ground truth but for whatever reason annotators missed it.

Again, if we try 8 annotations and predictions this time with two random missing annotations, the standard loss value= 0.493853! Clearly, this shows a higher loss value compared to when all the ground truths were available.

One easy solution would be to remove the images with missing ground truth. This means that if even one object out of C object has a missing ground truth we have to remove that image from the training data. However, that means less data for training!

Instead, we may be able to develop a smart loss function that avoids such penalization in case of missing ground truth. In this case, we write the loss function as:

Customized categorical cross entropy

where w_i is the smart weight. If w_i=1 it will be the same as the standard loss function. We know that if the ground truth is missing for an object, that means that it is assigned as the background. As such if we set w_0=0 for those pixels that are detected as the object without the ground truth, we will remove any contribution of the background in the loss value. In other words, the custom loss function can be written as below:

Custom loss function.

To this end, we consider two conditions. First, we find images with missing ground truth. This is possible using:

K.any(y_true,axis=1,keepdims=True)

Next, we find the predicted classes per pixel for all images using:

pred=K.argmax(y_pred,axis=-1)

Then, we check if the predicted output is, in fact, equal to the missing object. This is also possible using:

K.equal(pred,cls)

Note, in the actual implementation, we use:

K.not_equal(pred,cls)

since we want both conditions to be False so that the logical-OR is False.

If these two conditions are satisfied we set the background weight equal to zero. This will guarantee that if an object has missing ground truth (in fact mistakenly labeled as background), then the contribution of background in the loss function is zero. The final custom loss function is here:

If we calculate the loss for 8 annotations and two random missing objects we will get custom_loss= 0.191179. This shows that we do not penalize the AI model for providing a correct output just because the ground truth does not exist. In practice, this technique will lead to better overall performance for the objects with missing ground truth.

Summary

We can always use the standard loss function and they work fine for most cases. However, if you encounter special cases and would like better performance, you can customize the loss function based on your needs.

--

--