My Experiments with YOLOv5:Almost everything you want to know about Yolov5- Series- Part 1

Manjusha Sithik
5 min readDec 26, 2023

--

Image by DALL.E

Did you know how to change the architecture of Yolov5 to suit your use case ?
Did you know what is bounding box refinement in Yolov5?What are background images and can it improve the detections of Yolov5?What are the default fitness values of Yolov5 metrics ? How to change them to improve your results?How to change the default weights and parameters for augmentations in Yolov5?And finally how to improve the performance of Yolov5 by identifying the issues in your dataset?

I intend to share my experiences and experiments with YOLOv5 in this series.

We will approach the problem in following steps:

Problem description

Dataset description

EDA before building the model

Set up the baseline and evaluate it

Identify potential steps to improve the solution

Implement them one by one and evaluate against baseline

Implementation details of Training Yolov5 on custom dataset is not covered in this article as the main focus here is to introduce unexplored features/ functionalities of Yolov5. The code is available at https://github.com/manjushasithik/thesis

Problem

We are going to use pictor-V3 dataset to detect With_Helmet and Without_Helmet cases . Then we suppress With_Helmet cases to alert the non-compliance of safety practices.

Dataset

I am using a popular public dataset pictor-V3 to demonstrate the experiment. The dataset Pictor-V3 includes 685 total images and is divided into train, valid, test sets in the ratio of 70:20:10 respectively .Train data contains 478 images , valid has 139 images and test has 68 images. Image dimensions were above 1000 but was resized to 640x640 to maintain the uniformity of image dimensions throughout the dataset.

EDA

It is important to understand the dimensions of the object instances and the distribution of these dimensions. Therefore , a K-Means clustering with 3 clusters was done as shown in Figure 1. Cluster Analysis on the Dataset to understand how these instances are distributed in terms of dimension . It was observed that the height and width of instances are divided into 3 clusters as described below. Small dimension cluster is centered at around 24px width and 33px height , medium cluster at around 87px width, 104px height and Big cluster at 215px width, 253px height.

Fig1: Cluster Analysis on the Dataset

A further analysis was performed to understand the ratio of With_Helmet and Without_Helmet classes in these three cluster as given in Figure 4.3.2 Class ratio & mean analysis on three clusters. EDA results of Train data also was in line with this as shown in Figure 4.3.3 EDA on Training Dataset. The majority of the instances are included in the Small cluster and this cluster contributes mainly for the class imbalance of the dataset while the other clusters are class balanced to an extend.

Fig2: Class ratio & mean analysis on three clusters

Since it is very much clear from mean analysis that the small objects of approximate dimension 24px x 33px dominate the dataset and the ratio of With_Helmet to Without_Helmet is around 4:1 in the small dimension cluster.

Fig3:EDA on Training Dataset

It is worth to note that Fig3 which gives insights similar to Fig1 & Fig2 was generated by Yolo and was saved as labels.png during the training. There are other insights yolo provides us to rectify or improve model performance specific to the dataset at hand .We will revisit this topic towards the end of this series.

The major points we identified here are :

  1. The majority of the target objects are small
  2. There exists a class imbalance problem in this dataset.

Now that we have identified the actual issue we can think how to resolve it.

We can try common performance enhancement techniques like :

Change Scaling related architecture (Neck ) to improve small object detection.

Set Weighted Cross Entropy Loss to emphasize Without_Helmet class.

Adding Without_Helmet instances of small dimensions to solve class imbalances.

Adjusting the initial values for the K-Means clustering for bounding box to focus small objects.

Adjusting Fitness values of the model to give importance to the metrics suitable for our use case

Hyperparameter tuning of Albumentation

Adding background images to reduce false positives

Now its time to set up the baseline model.

Baseline Model

The baseline model is implemented using Yolov5s with the default settings . The neck is using FPN and PANet .The weighted cross entropy loss was set to be 1 , indicating that equal importance was given to both the classes. Albumentation hyperparameter values for Blur, Median Blur , CLAHE was set to .01 which is the default value. The default initial values for K-Means clustering for the bounding box clustering also were intact. The metrics fitness default values are kept as it is ,i.e., 0.1 for mAP50, .0 9 for mAP50–95, and zero for precision and recall.

This baseline model achieved an overall precision of 73% , recall of 51% and mAP50 of 53%.Precision , recall , mAP50 for With_Helmet and Without_Helmet are 72%,47 %,51 %, 75%,54 %,54% respectively. These values are compared against the corresponding values of each experiment results to interpret the performance of the experiment. Confusion metrics shows that 58% of The With_Helmet and Without_Helmet were correctly identified.

Now we are ready to explore the enhancement techniques ,one by one. I cant wait to see you in the the upcoming parts .
Part 2 : https://medium.com/@manjusha.bs/my-experiments-with-yolov5-almost-everything-you-want-to-know-about-yolov5-series-part-2-6549abdb5b63

--

--

Manjusha Sithik

A Data Scientist Passionate about Computer Vision and Time Series Forecasting