YOLOv4- Speed & Accuracy

Published in

Analytics Vidhya

5 min readMay 23, 2020

YOLO (You only look once) but more sharper !!!!

Last few years object detection has starts maturing in ever since R-CNN was released, the competition remains cut-throat. YOLOv4 has again claim to have state-of-the-art(SOTA) accuracy while maintains a high processing frame rate. It achieves an accuracy of 43.5% AP (65.7% AP₅₀) for the MS COCO with an approximately 65 FPS inference speed on Tesla V100 as per the graph below. In object detection, higher accuracy & precision is few of many things we definitely want . We want the model to run smoothly in the edge devices like Rasberry Pi, Jetson Nano, Intel boards. How to process streaming real time video with these low power and low cost hardware becomes important and challenging pushing the need to get for robotics ,business and much more.(Code is Shared in the end with Video walk-through)

YOLOv4 is twice as fast as EfficientDet with comparable performance.

The YOLO v4 release lists three authors: Alexey Bochkovskiy, the Russian developer who built the YOLO Windows version, Chien-Yao Wang, and Hong-Yuan Mark Liao.(Unfortunately, the Creator of YOLO Joseph Redmon announced he was not pursuing computer vision due negative impact of his work )

As per the authors

Compared with the previous YOLOv3, YOLOv4 has the following advantages:
It is an efficient and powerful object detection model that enables anyone with a 1080 Ti or 2080 Ti GPU to train a super fast and accurate object detector.
The influence of state-of-the-art “Bag-of-Freebies” and “Bag-of-Specials” object detection methods during detector training has been verified.
The modified state-of-the-art methods, including CBN (Cross-iteration batch normalization), PAN (Path aggregation network), etc., are now more efficient and suitable for single GPU training.

Bag of freebies (Bof) & Bag of specials (BoS)

Improvements can be made in the training process (like data augmentation, class imbalance, cost function, soft labeling etc…) to advance accuracy. These improvements have no impact on inference speed and called “bag of freebies”. Then, there are “bag of specials” which impacts the inference time slightly with a good return in performance. These improvements include the increase of the receptive field, the use of attention, feature integration like skip-connections & FPN, and post-processing like non-maximum suppression. In this article, we will discuss how the feature extractor and the neck are designed as well as all these Bof and BoS goodies.

Methodology for meeting speed in Neural Network in Production & Optimization for Parallel Computing:

For GPU small number of groups (1–8) in convolutional layers: CSPResNeXt50 / CSPDarknet53
For VPU — grouped-convolution, but refrain
from using Squeeze-and-excitement (SE) blocks
- specifically this includes the following models:
EfficientNet-lite / MixNet / GhostNet / MobileNetV3

Selection of BoF and BoS on a General Sense

For improving the any object detection training, a typical CNN usually uses the following:

Activations: ReLU, leaky-ReLU, parametric-ReLU, ReLU6, SELU, Swish, or Mish
Bounding box regression loss: MSE, IoU, GIoU, CIoU, DIoU
Data augmentation: CutOut, MixUp, CutMix
Regularization method: DropOut, DropPath , Spatial DropOut , or DropBlock
Normalization of the network activations by their mean and variance: Batch Normalization (BN) , Cross-GPU Batch Normalization (CGBN or SyncBN) , Filter Response Normalization (FRN) , or Cross-Iteration Batch Normalization (CBN)
Skip-connections: Residual connections, Weighted residual connections, Multi-input weighted residual connections, or Cross stage partial connections (CSP)

Details of YOLOv4

Backbone: CSPDarknet53(CSP + Darknet53)

Neck: SPP(spatial pyramid pooling) , PAN(Path aggregation network)

PAN (Path aggregation network)https://arxiv.org/abs/1803.01534

Head: YOLOv3

YOLO v4 uses:

Bag of Freebies (BoF) for backbone: CutMix and Mosaic data augmentation, DropBlock regularization, Class label smoothing

Bag of Specials (BoS) for backbone: Mish activation,
Cross-stage partial connections (CSP), Multi input
weighted residual connections (MiWRC)
Bag of Freebies (BoF) for detector: CIoU-loss, CmBN, DropBlock regularization, Mosaic data augmentation, Self-Adversarial Training, Eliminate grid sensitivity, Using multiple anchors for a single ground
truth, Cosine annealing scheduler , Optimal hyper-parameters,
Random training shapes

Bag of Specials (BoS) for detector: Mish activation,SPP-block, SAM-block, PAN path-aggregation block, DIoU-NMS

!!! Curious to deep dive in each of above hyper parameters, please read through https://medium.com/@jonathan_hui/yolov4-c9901eaa8e61 (you will love it.)
Comparison of YOLOv4 on Different NVIDIA GPU Architectures (Maxwell,Pascal,Volta)

Final Thoughts from Authors:

A state-of-the-art detector which is faster (FPS) and more accurate (MS COCO AP50…95 and AP50) than all available alternative detectors. The detector described can be trained and used on a conventional GPU with 8–16 GB-VRAM this makes its broad use possible

Code and Walk through:(Fork the code)

SSusantAchary/YOLOv4-Cloud-Tutorial

This repository walks you through how to Build and Run YOLOv4 Object Detections with Darknet in the Cloud with Google…

github.com

https://www.youtube.com/watch?v=mKAEGSxwOAY (Credits to him for Code and Video)

Keep Learning !!!

References:

YOLOv4 paper- https://arxiv.org/abs/2004.10934
https://medium.com/@jonathan_hui/yolov4-c9901eaa8e61
Official and Managed Code: https://github.com/AlexeyAB/darknet