Focal Loss — What, Why, and How?
Focal Loss explained in simple words to understand what it is, why is it required and how is it useful — in both an intuitive and mathematical formulation.
Binary Cross Entropy Loss
Most object detector models use the Cross-Entropy Loss function for their learning. The idea is to have a loss function that predicts a high probability for a positive example, and a low probability for a negative example, so that using a standard threshold, of say 0.5, we can easily differentiate between the two classes. I am going to start with explaining the Binary Cross Entropy Loss (for 2 classes) and later I will generalize this to the standard Cross Entropy Loss (for n classes).
Let’s understand the above image. On the x-axis is the predicted probability for the true class, and on the y-axis is the corresponding loss. I have broken down the Binary Cross Entropy Loss into 2 parts:
- loss = -log(p) when the true label Y = 1
Point A: If the predicted probability p is low (closer to 0) then we penalize the loss heavily.
Point B: If the predicted probability p is high (closer to 1) then we don’t penalize the loss heavily. - loss = -log(1- p) when the true label Y = 0
Point C: If the predicted probability p is low (closer to 0) then we don’t penalize the loss heavily.
Point D: If the predicted…