Improving YOLOv4 accuracy on detecting common objects

Daniel Schwalm
Analytics Vidhya
Published in
9 min readMay 29, 2021

YOLOv4 comes with 80 built-in object classes that it is able to detect. I have been trying to detect people on a public webcam with YoloV4 and Darknet. My experience was that the pre-trained model is not able to detect people properly when the people are distant to the camera, the light conditions are unusual or the person is standing in front of special background.
To improve on people detection accuracy I decided to build my own model on top of YoloV4.

YOLO and Darknet

To understand how YOLOv4 works under the hood we have to talk a little bit about Darknet.
So YOLO is an object detection algorithm that has a few implementations including Pytorch, Keras and Darknet.
Darknet itself is, according to its own description “an open source neural network framework written in C and CUDA. It is fast, easy to install, and supports CPU and GPU computation.”

Building OpenCV and Darknet from source

To perform custom object detection on Darknet we have to train on our model. For that we have to build Darknet and OpenCV from their source.
On a Windows 10 machine like mine, it means that we have to produce the necessary .dll and .exe files.

Watch out for the contents in this chapter. It is the most boring part of the process but have to be done :)
You can skip if you are interested only in the object detection part.

Building OpenCV from source

I wrote in details in my previous article how you can build OpenCV from source on Windows 10 enviroment.

For building OpenCV from source you have to follow the same steps, except that you have to set an extra property in CMake GUI before generating the project files: BUILD_opencv_world. You can tick this property and go on with the subsequent steps following the tutorial in the above article.

Building Darknet from source

I followed this amazing step-by-step tutorial by TheCodingBug channel to build Darknet from source. Here I am just summarizing the steps below, for further reference check out this video linked above.

Download the source code from Darknet’s github page as .zip file and extract it to a new folder darknet.

When extracted the .zip file copy the respective cuDNN dll from your CUDA directory into darknet/build/darknet/x64 folder. For me it was “c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin\cudnn64_8.dll” but it may vary depending on your CUDA version and the supported architecture of your GPU.

Go to the OpenCV build folder and copy install/x64/vc16/bin/opencv_world*.dll into darknet/build/darknet/x64 folder.

Open darknet/build/darknet/darknet.vcxproj file for editing and set your CUDA version in two places: $(VCTargetsPath)\BuildCustomizations\CUDA 11.2.props and $(VCTargetsPath)\BuildCustomizations\CUDA 11.2.targets.
Mine was 11.2 as seen above but yours may be different.

Also, do the same for darknet/build/darknet/yolo_cpp_dll.vcxproj.

Note: for me the .props and .targets files were missing from $(VCTargetsPath)\BuildCustomizations which is the Visual Studio folder: c:\Program Files (x86)\Microsoft Visual Studio\2019\Community\MSBuild\Microsoft\VC\v160\BuildCustomizations\ so I had to copy them from CUDA directory c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\extras\visual_studio_integration\MSBuildExtensions\
Maybe this solution is not something to follow, but worked for me
.

Now open darknet/build/darknet/yolo_cpp_dll.vcxproj with Visual Studio. First, change the solution configuration from Debug to Release/X64 in the menu bar. Then in the Solution Explorer, right click on yolo_cpp_dll and hit Build.

Once the build is completed successfully open darknet/build/darknet/darknet.sln with Visual Studio. You will have to set up a couple of properties before building darknet.
Right click on darknet and select Properties.
Now set the following properties:

  • add OpenCV include folder to C/C++/General/Additional Include Directories property which contains paths. For me the include path was <opencv root>\build\install\include\;
  • add OpenCV lib folder to Linker/General/Additional Library Directories property which contains paths. For me the lib path was <opencv root>\build\install\x64\vc16\lib\
  • remove CUDNN_HALF from the list found in C/C++/Preprocessor/Preprocessor Definitions
  • remove compute_75,sm_75 from CUDA C/C++/Device/Code Generation

Once this all is done, right click on darknet in Solution Explorer and hit Build. When the build completes, you will find darknet.exe in darknet/build/darknet/x64 folder.

Testing the Darknet build

To test Darknet that we just built download the yolov4.weights file from the darknet github page into darknet/build/darknet/x64 folder.
Now issue the following command in an Anaconda prompt:

darknet.exe detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights

After displaying different information on the Darknet neural network layers on the console Darknet prompts for an image name to detect objects on. You can provide dog.jpg as a default test image present in x64 folder. Or you can use any arbitrary image. If you built Darknet successfully, you should get a similar output:

Darknet build test image

Now we have the pre-trained model available but this time we want a custom model that has improved capabilities recognizing people and cars compared to the original pre-trained model.
Let’s build this custom model then.

Annotating images with Roboflow

Collecting training images from webcam

Any custom model requires images to be trained on. As it is street scene I decided to detect people, cars and buses only. So my model will contain merely 3 classes.
For training purposes I collected hundreds of images from the Pula public webcam I used in the previous article.

For the image collection I used a script that captured a frame from the webcam stream in every 30 seconds. I ran it for a few hours including day and night. It gave me hundreds of images. Also, I built in a mechanism so that when I pressed a key it also captured an image. I used it for capturing images when something unusual happened on the screen that may confuse the model.
Then I sorted out those frames that contained people, car or bus and used around 200 images for training.

Sample training image from street webcam

Utilizing Roboflow

Previously I used the open source LabelImg tool for image annotation but this time I wanted to try out something different, something that may come with more features than LabelImg.
I decided to try out Roboflow which I heard about earlier. With the free version of Roboflow you can annotate 1000 images for your project. Without going into the details of Roboflow let me list their most attractive features to me:

  • Label Assist: it helps annotating known objects with the help of Coco v1 model on various confidence levels. It reduces annotation time significantly.
  • Zooming: it helps labeling small object that you can zoom in up to 4000% into the original image
  • Mark null image: you can easily create images that lack of any objects to detect with a click of a button. These images are useful for the model to recognize the lack of the information
  • Move images easily between train, test and validation set: images that are difficult to interpret may be used for validation
  • Dataset health check: you can check your dataset for missing annotations or imbalanced classes, whether any of your classes are over or underrepresented
  • Export dataset in many formats: it was just a click of a button to export the dataset in Yolo Darknet format, but most of the major image annotation formats are supported including Pascal VOC, Tensorflow TFRecord or Yolo formats for Keras and Pytorch, etc.
Roboflow annotation tool in action

Training Darknet for custom object detection

Preparing dataset for training

When I finished the annotation I exported my dataset from Roboflow in Yolo Darknet format. For training purposes I selected an image size of 416 x 416 pixels, which is one of the supported sizes of Yolov4.

I chose this lower resolution to speed up training. To increase accuracy slightly you can use a higher image size of 512x512 or 608x608, but that will come with a longer training time.

After downloading the dataset I had a similar folder structure:

- train
|--- file1.jpg
|--- file1.txt
|--- file2.jpg
|--- file2.txt
- valid
|--- file1.jpg
|--- file1.txt
|--- file2.jpg
|--- file2.txt
- test
|--- file1.jpg
|--- file1.txt
|--- file2.jpg
|--- file2.txt

So I had a train, valid and test folder containing images and the corresponding text files that marks the classes and their bounding boxes on the images.
Then I created a new folder ‘pula’ and copied the whole dataset under that folder in darknet\build\darknet\x64\data\pula

For training purposes we will need a list of all image file paths in a text file.
For that I prepared a small Python script that produces such text file for both training and validation sets (test set I ignored for now):

import globfor dataset in ['train','valid']:
imglist = glob.glob("data/pula/%s/*.jpg" % dataset, recursive=False)

with open("data/pula/%s.txt" % dataset, 'w', encoding = 'utf-8') as f:
for img in imglist:
img = img.replace("\\", "/")
f.write(img + '\n')

The contents of a file would look something like this:

data/pula/train/timed_000000259_jpg.rf.24168a1d896b1c57c717ee217e967582.jpg
data/pula/train/timed_000000262_jpg.rf.c98c33d813c019b4e4d64ebb076b94f1.jpg
data/pula/train/timed_000000263_jpg.rf.3c3b031aa46817f2248aad0f65041d1c.jpg
data/pula/train/timed_000000534_jpg.rf.4b77175b4b83fa2cff5b9714e6f70353.jpg
data/pula/train/timed_000000823_jpg.rf.aa55800b7d72d13e0e9e3ec1e64845e9.jpg
...

Then copy coco.names and coco.data under darknet\build\darknet\x64\data with a new name. I used pula.names and pula.data and then contents were like this:

pula.dataclasses= 3
train = data/pula/train.txt
valid = data/pula/valid.txt
names = data/pula.names
backup = backup/

The .data file describes how many classes you have, where the training and validation set can be found, what are the classes names and in which backup folder the temporary training files have to be stored.

pula.namesbus
car
person

The .names file contains nothing but the class names, in the same order as you see it in the _darknet.labels files in the Roboflow export.

Preparing Darknet for training

After preparing the image dataset we have to add some configuration to Darknet to be able to train on our own data.

First, copy darknet\build\darknet\x64\cfg\yolov4-custom.cfg with a new name like yolov4-pula.cfg.
I changed the following parameters only, but you are free to change any of them until you know what you are doing :)

# refer to https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objectssubdivisions=32# this should match your training image size
width=416
height=416
# this should be 2000 * number of classes
max_batches = 6000
# this should be 80% and 90% percent of max_batches respectively
steps=4800,5400
# for all [yolo] layers set the number of classes
classes=3
# for all [convolutional] layers right before the [yolo] layers change the number of filters to (number of classes + 5) * 3
filters=24

Then download the pre-trained YoloV4 weight file into the darknet\build\darknet\x64 folder from here.

Now you can actually start training Darknet with the following command:

darknet.exe detector train data/pula.data cfg/yolov4-pula.cfg yolov4.conv.137

The training took about 5 hours for me on my NVIDIA GeForce RTX 2080 GPU. During training you could see a chart how the loss is decreasing as the batches go by.

YoloV4 loss chart during training

Using the model in OpenCV

To use this newly trained model in OpenCV for object detection you only have to replace the .names, .cfg and .weights files in the detection script found in my github repository with the new version of these files.

Evaluating the results

There are two ways to evaluate the results:

  • evaluate by model statitistics
  • evaluate by actually observing the model during detection

As for the stats, you can make Darknet to prepare some stats for you, using the following command:

darknet.exe detector map data/pula.data cfg/yolov4-pula.cfg backup/yolov4-pula_final.weights

This should provide some similar stats for you:

YoloV4 performance statistics

By analyzing these stats you can see that the model excels in detecting cars with an average precision of 93.38%. For people it stands around 80% while for buses it is around 71%. This is not so great, but definitely an improvement compared to the default model.

Let’s see the model in action. I uploaded a 6 minutes video about the model in action to Youtube.

Generally, I am more satisfied with the results then I was with the default model. However some issues still occur:

  • people are not detected sometimes, especially in the background
  • sometimes a section of the bus stop is detected as a person
  • when two people walk closely together they are detected as a single person

These issues may be fixed with some more training images-

Conclusion

Thanks for reading through my article. As you have seen, if you want to create your own custom YoloV4 model with Darknet, you need to do some boring boilerplate tasks like building OpenCV and Darknet from source. When annotating the images, things start getting more exciting. You find some interesting situations for what you may want to prepare your model to make it more robust.
After you finish the training and see your model actually working, that is quite satisfactory :)
So I encourage you to start your own project and let me know in the comments how it went.

--

--

Daniel Schwalm
Analytics Vidhya

Software engineer. Machine Learning and Deep Learning enthusiast. My opinion here does not necessarily reflect my employer's.