Using the state-of-the-art YOLOv4 with COCO Dataset to Detect Objects

5 min readJul 15, 2020

YOLO (You Only Look Once) is a real-time object detection algorithm developed by Joseph Redmon in 2015 which at the time is a state-of-the-art object detection algorithm (and still is!). Prior to YOLO, detection algorithms use classifiers to perform detection where they apply the model to an image at multiple locations and scales, and they consider detection on the high scoring regions of the image. This process is computationally expensive and they are not able to perform well in real-time using ‘your average laptop’. YOLO uses a different approach to object detection, according to the original paper:

“we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance.
Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is far less likely to predict false detections where nothing exists. Finally, YOLO learns very general representations of objects. It outperforms all other detection methods, including DPM and R-CNN, by a wide margin when generalizing from natural images to artwork on both the Picasso Dataset and the People-Art Dataset.”

Fast forward to 2020, there have been 3 versions of YOLO which authored by Joseph Redmon (up to version 3), and 2 versions by 2 different authors which improves the version 3 of YOLO left off by the original author.

During the quarantine I have nothing to do so I decided to take Udemy online courses and one of them discuss about using YOLOv3 to detect objects in image and video. I’ve finished that specific course and wanted to do my own projects when I came across YOLOv4. I didn’t know it exist! To my knowledge, the original author of YOLO (Joseph Redmon) stopped working on the project due to privacy concerns and his fear that his work would be used for military purpose. Yet here we have YOLOv4 and there’s even a version 5 of YOLO (which proved to be controversial), but they are not authored by the original author. YOLOv4 was authored by Alexey Bochkovskiy, where he improved YOLO by adding a handful of new features which he said will improve the Convolutional Neural Network (CNN) accuracy and achieve state-of-the-art results.

Enough about the history, now let me show you how to actually use YOLOv4 to detect objects!

First off, you’re going to need to clone this git repository. If you have git installed on your Linux, you can clone the repo by typing the following into your terminal:

git clone https://github.com/AlexeyAB/darknet

And as stated on the repository, to make this work it requires you to have:

Windows or Linux
CMake >= 3.12
CUDA 10.0
OpenCV >= 2.4
cuDNN >= 7.0 for CUDA 10.0
GPU with CC >= 3.0
on Linux GCC or Clang, on Windows MSVC 2015/2017/2019

Make sure you have all the requirements above installed before continuing. Once you’ve done that, we need to do some changes to the Makefile to make sure we are enabling GPU to run YOLOv4 and include OpenCV in the build by typing the following into your terminal:

cd darknet
sed -i 's/OPENCV=0/OPENCV=1/' Makefile
sed -i 's/GPU=0/GPU=1/' Makefile
sed -i 's/CUDNN=0/CUDNN=1/' Makefile
sed -i 's/CUDNN_HALF=0/CUDNN_HALF=1/' Makefile

After that, go ahead and build the darknet by typing the following:

make

Luckily, YOLOv4 has been pre-trained on the COCO (Common Objects in Context) dataset which has 80 classes that it can predict. We will use these pre-trained weights so that we can run YOLOv4 without wasting time creating the model ourselves. You can download the .weights file using this command:

wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights

At this point we are now ready to use YOLOv4 to detect objects!

To detect objects in an image, you can use this general command:

./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights <path to image>

After running the command, you should see a predictions.jpg file on your darknet folder.

Here’s the original image that we want to identify:

Here’s what YOLOv4 with COCO datasets can identify:

That’s insanely cool, is it not!?

But that’s not all. We can even identify objects in a video!

To do that, we can use this general command:

./darknet detector demo cfg/coco.data cfg/yolov4.cfg yolov4.weights -dont_show <path to input video> -i 0 -out_filename <path to output video>

Here’s the original video that we want to identify:

Here’s what YOLOv4 with COCO datasets can identify:

We are only scratching the surface of what YOLOv4 can actually do! In this demonstration, all we do is running the built-in program that YOLOv4 has provided to identify objects in an image and in a video. There’s a lot you can do with YOLOv4 such as car tracking and speed estimation ← this project is based on YOLOv2 (imagine that on YOLOv4!), and many more cool things!

I’m no expert (yet) in the field of Deep Learning for Computer Vision. I’m still learning and I hope this demonstration can inspire the reader about the field of Deep Learning and Computer Vision! It certainly inspired me!

That’s all for now, thanks for reading!

Using the state-of-the-art YOLOv4 with COCO Dataset to Detect Objects

Written by buzzind99