YOLOv4 vs YOLOv5

Ildar Idrisov, PhD
Deelvin Machine Learning
7 min readJun 30, 2020

Hi,
Today we’ll talk about the well-known YOLO architecture. In particular, about recent releases. Why releases? Because after a long time two major versions of YOLO — 4 and 5 respectively — were released, with an interval of about a month. How did it happen? Let’s try to understand it.

YOLOv3

YOLO was created by Joseph Redmon and is based on the darknet neural network. After the third version, Joseph Redmon stopped supporting the repository and tweeted:

His work was continued by Alexey Bochkovskiy in the fork of the main repository which was called darknet as well as the original one. Like Joseph, Alexey kept an open and free license for his developments.

YOLOv4

And then, on April 23, 2020, an article from the research group Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao, got published which is called YOLOv4: Optimal Speed ​​and Accuracy of Object Detection. It marks the appearance of a new major update. The darknet repository contains the correct description, documentation, comparison, zoo models. Everyone is happy, including myself. The fourth version of YOLO was considered the fastest real-time model for object detection.

YOLOv5

On May 27th, YOLOv5 comes out unexpectedly, from unknown authors, with the GNU General Public License v3.0 and a link to a commercial site. That was … weird?

For a while, I was glad that Joseph’s creation found followers, but then I decided to compare these two versions, because on the YOLOv5 website I did not find a convincing comparison and sufficient descriptions. Let’s see what came of it.

YOLOv4

Authorship

The author of the original framework is Joseph Redmon. The main author of the fork and YOLOv4 is Alexey Bochkovskiy. In Joseph’s original repository in Readme file, there is a link to Alexey’s repository and his article. Joseph also tweeted:

Articles

The article describes in detail the conducted review, tests, and comparisons with previous results.

Variety of models

YOLOv4 can be built and run on Linux and on Windows.

Through the efforts of co-authors and the YOLOv4 community, you can run under a number of frameworks, such as TensorFlow, OpenCV, OpenVINO, PyTorch, TensorRT, ONNX, CoreML, etc.
https://github.com/AlexeyAB/darknet#yolo-v4-in-other-frameworks

Pre-trained model

YOLO4 config and weights.

Also on June 25 came YOLOv4-tiny.
Config and weights.

Other models:
https://github.com/AlexeyAB/darknet/wiki/YOLOv4-model-zoo

Documentation and description of the learning process

In the repository, everything is described in detail.

License

Completely open.

Architecture

The architecture of the model is quite diverse and has many variations as expected from darknet. The following is used for the main YOLOv4 model:

Backbone — CSPDarknet53
Neck — Spatial pyramid pooling and Path Aggregation Network
Head — Class subnet and Box subnet, as well as in YOLOv3
As activations, Mishe is used for the full model and Leaky RELU for tiny.

Launch

First you need to configure the environment, the requirements can be found here. How to compile the code is described here.

We clone the repository and compile the code.

git clone https://github.com/AlexeyAB/darknet.git
cd darknet
./build.sh

Note: To build via CMake, you may need to update it to version 3.12 or higher. In the Ubuntu 18.04 repository the maximum version is CMake 3.10.

Download weights:

wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights

And run the test:

./darknet detector test ./cfg/coco.data ./cfg/yolov4.cfg ./yolov4.weights

After entering the line above, darknet is asked to indicate the path to the image being tested. We do this and get the result. I run a test on an image with a resolution of 1920x1080.

Predicted in 20.483000 milli-seconds.

For a more correct test, I created a list of 1000 iterations. The result time ranged from 20 to 21 milliseconds per image.

I also launched a video with a resolution of 1920x1080:

./darknet detector demo cfg/coco.data cfg/yolov4.cfg yolov4.weights ./../tests/liverpool.mp4 -dont_show

The results were:

FPS: 58.1     AVG_FPS: 57.6
YOLOv4 — Result

Now let’s try to run YOLOv4-tiny on the same test cases.

Note: An update with the new yolov4-tiny.cfg configuration has appeared recently. Therefore, those who have already downloaded darknet, update the local repository with the git pull command.

#git pull # if you refreshed repository before June 25
wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights

Test on the image:

Predicted in 2.569000 milli-seconds

Video test:

FPS: 209.1    AVG_FPS: 187.4
YOLOv4-tiny — Result

I took images and videos for tests with a large number of people in each frame and did not use any additional optimizations and accelerations. Everything by default.

YOLOv5

Authorship

Glenn Jocher is considered the author of YOLOv5, but all the code is in the repository for Ultralytics LLC. It should be acknowledged that the guys were engaged in supporting YOLOv3 and porting it to iOS, as well as version 5. Note that the Ultralytics repositories YOLO v3 and v5 are not forks of the original, unlike Alexey’s repository.

Article

There are many articles on the medium and in other blogs:
https://towardsdatascience.com/yolo-v5-is-here-b668ce2a4908
https://blog.roboflow.ai/yolov5-is-here/

There is no article at https://arxiv.org.

Variety of models

There are 4 different models in the repository: YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x. The first is the smallest and least accurate, the last one is the largest with the greatest accuracy. All models run on PyTorch.

Pre-trained Model

In the public domain there are all four variations of the models described above.
https://github.com/ultralytics/yolov5#pretrained-checkpoints

Documentation and description of the learning process

The documentation is weak. Comparison with other models is done poorly.

A description of the learning process is in Github and Google Colab.

License

GNU General Public License v3.0.
https://github.com/ultralytics/yolov5/blob/master/LICENSE

The product was updated and a completely free license was changed to another one.

Architecture

The largest version of the YOLOv5x model and the smallest YOLOv5s do not differ in layers. The difference between them as well as between other versions is in the scaling multipliers of the width and depth of the network. Also as is described in EfficientNet.

depth_multiple: 0.67 # model depth multiple
width_multiple: 0.75 # layer channel multiple

Compound scaling method is mentioned in the repository, but I didn’t find evidence of its use. There are no additional publications on the process. Description of architectures can be seen here:
https://github.com/ultralytics/yolov5/blob/master/models/yolov5s.yaml
https://github.com/ultralytics/yolov5/blob/master/models/yolov5x.yaml

Launch

Download the repository and install all the modules necessary for the run.

git clone https://github.com/ultralytics/yolov5
cd yolov5
#pip3 install numpy==1.17 # Strange but without it I had a problem
pip3 install -r requirements.txt

Note: Among the modules in the requirements, three are dependent on the package version: numpy == 1.17, torch> = 1 .4, PyYAML> = 5.3.

Download weights:

sh ./weights/download_weights.sh

Next, run the test for the largest version YOLOv5x.

python3 detect.py --weights ./weights/yolov5x.pt --source ./../tests/street.jpg

Result for image:

Done. (0.023s)

Run for video has the same commands.

python3 detect.py --weights ./weights/yolov5x.pt --source ./../tests/liverpool.mp4

Result for video:

video 1/1 (755/755) ../tests/liverpool.mp4: 384x640 18 persons, 2 backpacks, 3 handbags, Done. (0.016s)
...
Done. (27.870s)

Average FPS: 27.09

YOLOv5x — Result

And test for the smallest version YOLOv5s.

python3 detect.py --weights ./weights/yolov5s.pt --source ./../tests/street.jpg

Result for image:

Done. (0.008s)

Run video test:

python3 detect.py --weights ./weights/yolov5s.pt --source ./../tests/liverpool.mp4

Result for video:

video 1/1 (755/755) ../test/liverpool.mp4: 384x640 18 persons, 1 backpacks, Done. (0.007s)
...
Done. (21.311s)

Average FPS: 35.42

YOLOv5s — Result

Again as above, I did not use any additional optimizations and accelerations. And everything by default.

Comparison

In the YOLOv5 repository, I expected to see a comparison with the previous version of YOLOv4. This is at least logical. Once you release a major update and, moreover, claim that it is better than the previous versions, do provide a comparison. But unfortunately, not here. I found a comparison between the varieties of architectures of version 5, a comparison with EfficientDet, a comparison with their custom YOLOv3 implementation for PyTorch. I did not find a comparison with YOLOv4.

Also, digging a bit on the Internet found two interesting comparisons.

First from that same blog, which published an article “YOLOv5 is Here: State-of -the-Art Object Detection at 140 FPS” by the same authors Joseph Nelson, Jacob Solawetz. You can find it here.

The authors declare:

If you’re a developer looking to incorporate near realtime object detection into your project quickly, YOLOv5 is a great choice. If you’re a computer vision engineer in pursuit of state-of-the-art and not afraid of a little more custom configuration, YOLOv4 in Darknet continues to be most accurate.

They also indicate that YOLOv5 is faster.

YOLOv5s inferencing in 20 ms with batch size of 1
YOLOv4 inferencing in 22 ms with batch size of 1

The second comparison was made by WongKinYiu.
https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/32#issuecomment-640887979
https://github.com/ultralytics/yolov5/issues/6#issuecomment-647069454

He shows on different tests that YOLOv4 is faster and more accurate.

CSPDarknet53s-YOSPP gets 12.5% ​​faster model inference speed and 0.1% higher AP than YOLOv3- SPP.
CSPDarknet53s-YOSPP gets 19.5% faster model inference speed and 1.3% higher AP than YOLOv5l.

Judge for yourself.

Conclusion

In general, YOLOv5 turned out to be a pretty good model, but for the 5th major update it is still a long way away. And in this regard, I like YOLOv4 more; a gigantic work was done there and all the nuances were taken into account.

I found the new name to be unsuitable. You might have other thoughts on the matter.

In conclusion, I can only say one thing: I am ready to release YOLOv6 :)

Note: I ran all tests on the following configuration.
Processor (CPU) — Intel Core i9–9900K
Video Card (GPU) — NVidia RTX 2080Ti
Ubuntu 18.04.3 LTS
Nvidia CUDA 10.0+cudnn7.6.5

--

--