Focal Loss — What, Why, and How?

Lavanya Gupta
The Startup
Published in
9 min readJan 28, 2021

--

Focal Loss explained in simple words to understand what it is, why is it required and how is it useful — in both an intuitive and mathematical formulation.

Binary Cross Entropy Loss

Most object detector models use the Cross-Entropy Loss function for their learning. The idea is to have a loss function that predicts a high probability for a positive example, and a low probability for a negative example, so that using a standard threshold, of say 0.5, we can easily differentiate between the two classes. I am going to start with explaining the Binary Cross Entropy Loss (for 2 classes) and later I will generalize this to the standard Cross Entropy Loss (for n classes).

Binary Cross Entropy Loss

Let’s understand the above image. On the x-axis is the predicted probability for the true class, and on the y-axis is the corresponding loss. I have broken down the Binary Cross Entropy Loss into 2 parts:

  1. loss = -log(p) when the true label Y = 1
    Point A:
    If the predicted probability p is low (closer to 0) then we penalize the loss heavily.
    Point B: If the predicted probability p is high (closer to 1) then we don’t penalize the loss heavily.
  2. loss = -log(1- p) when the true label Y = 0
    Point C: I
    f the predicted probability p is low (closer to 0) then we don’t penalize the loss heavily.
    Point D: If the predicted…

--

--

Lavanya Gupta
The Startup

Carnegie Mellon Grad | AWS ML Specialist | Instructor & Mentor for ML/Data Science