Fifteen Minutes with FiftyOne: YOLOv4

A closer look at one of the fastest object detectors yet

Eric Hofesmann
Voxel51
4 min readAug 29, 2020

--

This is the first post in a series where I will be taking a look at state-of-the-art computer vision models and datasets through the new tool FiftyOne. FiftyOne lets you easily visualize datasets and labels to find interesting artifacts in your model predictions or annotations.

YOLOv4

YOLOv4 [1] is a recent installment of the single-shot YOLO object detection model that came out earlier this year. There has been some controversy around YOLOv5 but that is a story for a different time [2]. This iteration of YOLO comes loaded with architectural optimizations and loads and loads of new tricks and methods.

YOLOv4 performance compared to other models (Source)

YOLOv4 has a high mAP on the MS COCO dataset at speeds of 70 to 120 FPS and is designed to be trained and used on a single GPU!

The guts of YOLOv4

The improvements behind YOLOv4 fall under three categories: architectural updates, Bags of Freebies, and Bags of Specials.

Architecture

The architecture of YOLOv4 consists of a backbone (like a ResNet trained on ImageNet), a head that uses backbone features maps to detect objects (like YOLO and SSD), and a neck that connects the two (like Feature Pyramid Networks).

In particular, the selected architectures are:

Bags of Freebies (BoF)

Bags of Freebies are training methods that improve performance but are “free” during inference time. There are different BoFs used for training the backbone and the detector. There are some methods or augmentations that are novel to YOLOv4 marked with “NEW”.

Backbone BoF highlights include:

Detector BoF highlights include:

  • Complete IoU loss
  • Cross mini-Batch Normalization (NEW: A modified Cross-Iteration Batch Normalization)
  • DropBlock regularization
  • Mosaic data augmentation
  • Self-Adversarial Training (NEW: A new data augmentation where the network first alters the image in an adversarial attack on itself before training to detect an object on this modified image.)
  • Eliminate grid sensitivity
  • Using multiple anchors for a single ground truth
  • Cosine annealing scheduler
  • Optimal hyperparameters
  • Random training shapes

Bags of Specials (BoS)

Bags of Specials are training methods that improve performance but are “free” during inference time. Like BoFs, there are different BoSs used for training the backbone and the detector.

Backbone BoS highlights include:

Detector BoS highlights include:

Digging in with FiftyOne

Let’s see how YOLOv4 compares with YOLOv2 by loading up MS COCO validation in FiftyOne.

Tighter Boxes

With YOLOv4 comes a significant improvement in the tightness of bounding boxes.

Predictions visualized in FiftyOne | Blue: YOLOv2 | Green: YOLOv4
YOLOv4 has more predictions with an IoU > 0.8 than YOLOv2

YOLOv4 has more True Positives

After thresholding both YOLOv4 and YOLOv2 to contain roughly 29,000 detections, YOLOv4 had 8,000 more true positives than YOLOv2 at an IoU of 0.75. As a result, YOLOv4 also had significantly fewer false positives.

Predictions filtered in FiftyOne

False Positives by class

While YOLOv4 has significantly few false positives overall than YOLOv2, the percentage of false positives by class varies. Even though fewer “car” detections were missed by YOLOv4 in total, the percentage of “cars” missed compared to other classes was higher than by YOLOv2.

False-positive clas distribution displayed in FiftyOne

Conclusion

The improvements to YOLOv4 produced a high performing model that is designed to be friendly to users with limited computing resources. After analyzing the results with FiftyOne, the distribution of false positives by class appears similar to that of YOLOv2 indicating the mAP increases are likely due to tighter bounding boxes.

If you want to look through the outputs of YOLOv4 yourself, you can load them up here: https://github.com/voxel51/fiftyone-examples/blob/master/examples/comparing_YOLO_and_EfficientDet.ipynb

References

[1] Alexey Bochkovskiy, et al, YOLOv4: Optimal Speed and Accuracy of Object Detection (2020)

[2] Ritesh Kanjee, YOLOv5 Controversy — Is YOLOv5 Real?,(2020)

About Me

My name is Eric Hofesmann. I received my master’s in Computer Science, specializing in computer vision, at the University of Michigan. During my graduate studies, I realized that it was incredibly difficult to thoroughly analyze a new model or method without serious scripting to visualize and search through outputs. Working at the computer vision startup, Voxel51, I helped develop FiftyOne to help researchers and myself quickly load up and start looking through datasets and model results. This series of posts go through state-of-the-art computer vision models and datasets and analyzes them with FiftyOne.

--

--

Eric Hofesmann
Voxel51

Machine learning engineer at Voxel51, Masters in Computer Science from the University of Michigan. https://www.linkedin.com/in/eric-hofesmann/