Loss Functions for Medical Image Segmentation: A Taxonomy

4 min readAug 22, 2019

Loss functions are one of the important ingredients in deep learning-based medical image segmentation methods. In the past four years, more than 20 loss functions have been proposed for various segmentation tasks. Most of them can be used in any segmentation tasks in a plug-and-play way, In this blog,

We present a systematic taxonomy to sort existing loss functions into four meaningful categories. This helps to reveal links and fundamental similarities between them.
Moreover, we implement all the loss functions with pytorch. The code and references are publicly available here.

For simplicity, we only list the main idea of each loss and omit the mathematical formulation in this post. If you are interested the exact formulations, please visit the google slide.

Distributation-based loss

Cross entropy (CE) is derived from Kullback-Leibler (KL) divergence, which is a measure of dissimilarity between two distributions. For common machine learning tasks, the data distribution is given by the training set, so H(p) will be a constant.

Thus, minimizing CE is equivalent to minimizing KL divergence.

Weighted cross entropy is an extension to CE, which assign different weight to each class. In general, the un-presented classes will be allocated larger weights.
TopK loss aims to force networks to focus on hard samples during training.
Focal loss adapts the standard CE to deal with extreme foreground-background class imbalance, where the loss assigned to well-classified examples is reduced.
Distance penalized CE loss weights cross entropy by the distance map which is derived from the ground truth mask. It aims to guide the network’s focus towards hard-to-segment boundary regions.

Region-based loss

Region-based loss functions aim to minimize the mismatch or maximize the overlap regions between ground truth and predicted segmentation.

Sensitivity-Specifity (SS) loss is the weighted sum of the mean squared difference of sensitivity and specificity. To addresses imbalanced problems, SS weights the specificity higher.
Dice loss directly optimize the Dice coefficient which is the most commonly used segmentation evaluation metric.
IoU loss (also called Jaccard loss), similar to Dice loss, is also used to directly optimize the segmentation metric.
Tversky loss sets different weights to false negative (FN) and false positive (FP), which is different from dice loss using the equal weights for FN and FP.
Generalized Dice loss is the multi-class extension of Dice loss where the weight of each class is inversely proportional to the square of label frequencies.
Focal Tversky loss applies the concept of focal loss to focus on hard cases with low probabilities.
Penalty loss penalizes false negatives and false positives in generalized Dice loss.

Boundary-based loss

Boundary-based loss, a recent new type of loss function, aims to minimize the distance between ground truth and predicted segmentation. Usually, to make the training more robust, boundary-based loss functions are used with region-based loss.

Boundary loss

Hausdorff distance (HD) loss aims to estimate HD from the CNN output probability so as to learn to reduce HD directly. Specifically, HD can be estimated by the distrance transform of ground truth and segmentation.

Compound loss

By summing over different types of loss functions, we can obtain several compound loss functions, such as Dice+CE, Dice+TopK, Dice+Focal and so on.

All the methioned loss functions can be usd in a plug-and-play way. The code has been released on github: https://github.com/JunMa11/SegLoss.

Any suggestions will be very appreciated!

Loss Functions for Medical Image Segmentation: A Taxonomy

Distributation-based loss

Region-based loss

Boundary-based loss

Compound loss

Written by JunMa