YOLOv4 implementation to detect custom objects using Google Colab

Srikar
Analytics Vidhya
Published in
7 min readMay 23, 2020
Demo

I tested the detector on this video clip which is from an episode in Brooklyn Nine-Nine where Holt and Peralta catch mumps. Big fan of the show, you should definitely watch it!

So, in this article, I am going to explain how to implement YOLOv4 to detect custom objects. When I wanted to write about this, I planned to use a toy dataset but because of these unfortunate times where all of us have to wear masks, I thought this particular application would be more useful. Therefore, I decided to build an object detector that recognizes if someone is wearing a mask or not.

A brief intro to YOLO…

You only look once (YOLO) is a real-time object detection system which can precisely detect multiple objects in a single frame. It is extremely fast, according to the official Yolo website it is 1000x faster than R-CNN and 100x faster than Fast R-CNN.

The latest version 4 was published last month by Alexey Bochkovskiy — famously known as AlexeyAB on Github, Chien-Yao Wang, Hong-Yuan Mark Liao. The original authors quit developing YOLO, Joseph Redmon tweeted that “I stopped doing CV research because I saw the impact my work was having. I loved the work but the military applications and privacy concerns eventually became impossible to ignore.” Ali Farhadi, on the other hand, founded the company xnor.ai which is now acquired by Apple.

From the paper, you can understand, the main goal was to design a faster and accurate object detector that can be trained and tested on any conventional GPU to achieve real-time, high quality, and convincing results. I am not going into the detailed working of YOLO as this article is focused on the implementation.

Comparison of YOLOV4 to the state of the art object detectors [source]

As shown in the figure, YOLOv4 is 2x faster than EfficientDet — Developed by Google, with comparable performance. It obtained an AP of 43.5 (65.7% AP₅₀) on the COCO dataset, and also achieved a real-time inference speed of ∼65 FPS on Tesla V100. In comparison to YOLOv3, AP and FPS improved by 10% and 12%, respectively.

Let’s get into action!

Prerequisites:

  • CMake >= 3.12
  • CUDA 10.0
  • OpenCV >= 2.4
  • cuDNN >= 7.0 for CUDA 10.0
  • Windows or Linux

If you don’t want the hassle of setting up CUDA, cuDNN then use Colab, and also it has pretty good GPU(NVIDIA Tesla K80). I have used Google Colab for this example. Change the paths in the commands below accordingly if you are running on a local machine.

Tip: If you encounter any path related issues remember the paths should always be relative to darknet file.

Face mask Data: I started scraping the data from google and other sites but came across this Real-World Masked Face Dataset (RMFD). There are a lot of images of people wearing a mask and not wearing it, I took a fraction of it. So, the final dataset I used contains 600 images — 300(mask) & 300(no_mask).

Annotations: Now we need to annotate the images, for YOLOv4 each image need to have a corresponding .txt file in the same directory with the same name. Each text file contains the class id and coordinates of a bounding box for every object as shown below.

<object-class> <x_center> <y_center> <width> <height>
  • Object class is basically an integer ranging from 0 to classes-1. In this case, as there are two classes — 0: mask, 1: no_mask
  • <x_center> <y_center> <width> <height> are float values relative to width and height of image, it can be equal from (0.0 to 1.0]
Example of Labelling an Image
Sample of a ‘.txt’ file

For any labeling purposes I always use labelImg, it supports both YOLO and PASCAL VOC formats and GUI is designed with very good shortcuts. At the end of labeling make sure that you have one text file for one image with the same but different extensions. For example, I have 600 image files and 600 text files.

Steps to Train the data:

  1. Darknet build — I will use Darknet which is an open-source neural network framework. Let’s download and install the darknet. You can either download or clone it.
git clone https://github.com/AlexeyAB/darknet.git

After that in the darknet directory, open Makefile then set GPU, CUDNN, and OPENCV to 1, this accelerates the training by using GPU. Now run the following commands to build Darknet. If you are using Windows follow these steps to compile. (Alternate Source to installing Darknet on Windows)

os.chdir('darknet')
make

To verify the darknet build first, download the Yolo-v4 weights and place it in root directory then run the command below.

./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights data/person.jpg -dont_show

You can check the prediction which is saved as predictions.jpg in the root directory. If you are running in Windows use darknet.exe instead of ./darknet

Now if it’s working successfully since the downloaded weights are obtained from training COCO dataset it works on all the 80 coco classes.

2. Configurations — Based on your requirement select a YOLOv4 config file. I selected yolov4-custom.cfg, copy the contents of cfg/yolov4-custom.cfg to a new file cfg/yolo-obj.cfg. Adjust the parameters like batch, subdivisions, steps, max_batches accordingly. For more info on parameters refer to this link.

Tip: set max_batches value a minimum of 2000* classes and steps to 80% - 90% of max_batches. Also the width, height parameters should be a multiple of 32 (I have used 416x416).

Update the classes parameter to the number of objects in the 3 yolo layers in the yolo-obj.cfg file in lines 970, 1058, 1146 to 2, since we have only 2 classes (mask, no_mask). Now similarly, update the filters parameter to filters=(classes + 5) x 3 in the 3 convolutional layers before each yolo layer. In this case classes = 2 so, set the filters to 21 in the lines 963, 1051, 1139 (Don’t write filters = (classes + 5) x 3 in the .cfg file)

In order to start training, we need to create the following files :

Create obj.names in the directory build\darknet\x64\data\ with class names, in this case — mask, no_mask.

obj.names file sample

Now we have to create train.txt and test.txt files in the directory build\darknet\x64\data\ which contain paths to the images. Use the following code to create the files.

In the end, both the files should look like this :

/content/gdrive/My Drive/darknet/build/darknet/x64/data/obj/273.jpg
/content/gdrive/My Drive/darknet/build/darknet/x64/data/obj/294.jpg
/content/gdrive/My Drive/darknet/build/darknet/x64/data/obj/15.jpg

Put all the image files(.jpg) and label files(.txt) in the same directory which is build\darknet\x64\data\obj

After that make the obj.data file in the directory build\darknet\x64\data\ which should contain the number of classes, paths to train.txt, test.txt, obj.names, and weights.

classes= 2
train = build/darknet/x64/data/train.txt
valid = build/darknet/x64/data/test.txt
names = build/darknet/x64/data/obj.names
backup = build/darknet/x64/backup/

The last step, download the pre-trained weights for the convolutional layers and put in the directory build\darknet\x64. We are using yolov4-custom.cfg so, download yolov4.conv.137 for other configurations or versions of Yolo download the weights accordingly.

Tip: Say if you want to detect only 4 classes — person, laptop, fridge, tv which are present in COCO data first edit the configurations respectively then for training use the pre-trained weights. Since the weights are obtained from training MS COCO dataset which consists of your 4 classes, you don’t need to collect data again, just annotate maybe 10 images per class and train it. Voila, object detector ready!

3. Training — Now that we have all the files let’s start training!

./darknet detector train build/darknet/x64/data/obj.data build/darknet/x64/cfg/yolo-obj.cfg build/darknet/x64/yolov4.conv.137 -dont_show

After training is complete weights will be saved as yolo-obj_last.weights for every 100 iterations and saved as yolo-obj_xxxx.weights for every 1000 iterations in the directory build\darknet\x64\backup

Continue the training until the loss reaches a certain threshold. I stopped training after 1000 iterations.

Tip: In the yolo-obj.cfg file set the flag random to 1, it will increase precision by training Yolo for different resolutions. You can always increase the network resolution to achieve better performance. (If you encounter an error out of memory just increase the subdivisions parameter to a higher value)

4. Detection— You can run the detection on either a video file or an image.

./darknet detector demo build/darknet/x64/data/obj.data build/darknet/x64/cfg/yolo-obj.cfg build/darknet/x64/backup/yolo-obj_1000.weights build/darknet/x64/data/peralta_holt_mumps.mp4 -out_filename result.avi -ext_output -dont_show

The above command is to perform object detection on a video that will be saved as result.avi To test on an image run the command below. You can see the result here.

./darknet detector test build/darknet/x64/data/obj.data build/darknet/x64/cfg/yolo-obj.cfg build/darknet/x64/backup/yolo-obj_1000.weights test.jpg -dont_Show

You can also implement this detection for a real-time feed

./darknet detector demo build/darknet/x64/data/obj.data build/darknet/x64/cfg/yolo-obj.cfg build/darknet/x64/backup/yolo-obj_1000.weights -c 0

Tip: After training to boost the performance, increase the network resolution in yolo-obj.cfg to 608x608 or 832x832 — this makes it possible to detect small objects.

This project can help organizations to monitor if people are wearing a mask at the entrances. In the future, I plan to further improve this idea by implementing a face recognition system that works even when wearing a mask.

I am happy to contribute to society during these unfortunate times caused by COVID-19. If you face any errors or have any problem with gathering data reach out to me, I am happy to help.

References

If you want to know more about what’s new with the YOLOv4 in-depth refer to the following articles.

  1. https://github.com/AlexeyAB/darknet
  2. https://arxiv.org/abs/2004.10934
  3. https://medium.com/@jonathan_hui/yolov4-c9901eaa8e61
  4. https://towardsdatascience.com/yolo-v4-optimal-speed-accuracy-for-object-detection-79896ed47b50

About me:

I am a graduate student at the National University of Singapore studying Artificial intelligence and robotics. I love to use data to solve everyday problems!

--

--

Srikar
Analytics Vidhya

Computer Vision Engineer based in Singapore. Passionate about technology!