Everything about Focal Loss

Let’s understand Focal loss and its contribution to the imbalance problem.

Pooja Tambe
Geek Culture
4 min readAug 5, 2022

--

Source:https://stock.adobe.com

Facebook AI Research (FAIR) proposed the Focal Loss concept to handle class imbalance and demonstrated its effectiveness with their one-stage detector, RetinaNet. Before going to the main topic, we will go through some prerequisites.

  • Negative Examples: Samples having low overlap with ground truth (Low IOU) or background information.
  • Positive Examples: Samples having more overlap with ground truth (High IOU) or foreground information.
  • Easy Examples: Correctly classified samples.
  • Hard Examples: Misclassified samples.

Class imbalance

The class imbalance occurs when one class has more samples in a dataset or mini-batch during training. This can occur in two different ways from the object detection perspective:

  • In foreground-background class imbalance, the over-represented class is the background class and the under-represented class is the foreground class.
  • In foreground-foreground class imbalance, the over-represented and the under-represented classes are both foreground classes.

The foreground-background imbalance occurs during the training of dense detectors. As the easily classified background samples (easy negatives) do not contribute to learning causing inefficient training and leading to a degenerate model.

Focal Loss

We will start understanding the focal loss function from its base function cross-entropy loss for binary classification.

  • Cross-entropy Loss

The cross-entropy is mathematically represented as:

Eq-1

Where, y = ground truth class, p= estimated probability for the class with label y=1.

For simplicity, we can rewrite cross-entropy loss as:

Eq-2

where,

Eq-3
  • Balanced Cross-entropy

To address the class imbalance, cross-entropy is multiplied by the parameter alpha.

Eq-4

where, alpha ranges between[0,1] and class 1= alpha, class 0= (1-alpha).

The parameter alpha focuses on balancing positive and negative examples during training.

  • Focal Loss

The focal loss is an improved version of a cross-entropy loss function. It handles the class imbalance faced by a one-stage detector. As the balanced cross-entropy distinguishes between positive and negative examples, focal loss further differentiates easy and hard examples.

The focal loss adds modulating factor to the cross-entropy function with hyperparameter gamma. The focal loss is visualized for gamma values 0 to 5.

Eq-5
fig: Focal loss and cross-entropy loss comparison[1]

The above graph shows,

  1. For easy examples (p>=0.5), the focal loss value is less compared to cross-entropy.
  2. As the gamma value is increasing, the modulating factor reduces the loss value for well-classified examples and gives high weightage to misclassified examples during training.
  3. The variation in gamma concentrates loss on hard samples and focuses all attention on hard negatives.

Let’s, compare focal loss and cross-entropy loss for easy and hard samples with examples. The loss values are calculated using the following code,

# calculate cross entropy loss.def cross_entropy(p, label):
if (p>=0.5 and label==1) or (p<0.5 and label==0):
print("correctly classified example")
else:
print('Misclassified example')
loss= -(label*(math.log(p)))-((1-label)*(math.log(1-p)))
return loss
# calculate focal loss.def focal_loss(p, label, alpha, gamma):
if (p>=0.5 and label==1) or (p<0.5 and label==0):
print("correctly classified example")
else:
print('Misclassified example')
loss= -((alpha)*(1-p)**gamma*(label*(math.log(p))))-((1-alpha)*(1-(1-p))**gamma*((1-label)*(math.log(1-p))))
return loss
Comparison between cross entropy and focal loss with example.

From the above comparison, we can conclude that,

  1. The ratio of cross-entropy and focal loss: Easy Positive~405.23, Easy Negative ~133.36, Hard Positive ~ 4.938, Hard Negative~ 1.646.
  2. The focal loss values are less as compared to cross-entropy loss.
  3. The ratio values justify the focal loss nature of the decaying contribution of easy samples and concentration on hard samples for correcting misclassified examples during training.
  4. The focal loss more focuses on hard negatives.

I hope this blog will give you a better idea of focal loss and the effect of changing gamma.

Thank you..!!!

References

  1. https://arxiv.org/pdf/1708.02002.pdf
  2. https://arxiv.org/pdf/1909.00169.pdf

--

--