Training YOLOv5 custom dataset with ease

6 min readJan 26, 2022

YOLOv5 is one of the most high-performing object detector out there. It is fast, has high accuracy and is incredibly easy to train.

In this story, we talk about the YOLOv5 models training using custom datasets through a case study using the Labeled Mask dataset.

Note that this article is a sequel of Detecting objects with YOLOv5, OpenCV, Python and C++. If you are new to machine learning, worth read a gentle introduction to modelling concepts here.

YOLOv5 pre-built models

YOLOv5 is shipped with a set of models: YOLOv5n, YOLOv5s, YOLOv5m, and others. They are pre-trained using the MS COCO dataset:

Those YOLOv5 models are able to classify objects using one out only 80 classes (“person”, “car”, “bicycle”, “boat”, “bird”, etc).

If these classes do not fit your application requirements, it is likely that you need to train YOLOv5 with a different set of classes and images. Luckily, training YOLOv5 with a custom dataset is surprisingly easy.

But what is a dataset?

In the context of object detection, a dataset is a set of images and their respective annotations. Consider the following example:

The image above and its annotation file on the right are part of the tech zizou’s Labeled Mask dataset. This annotation file has 4 lines being each one referring to one specific face in the image. Let’s check the first line:

0 0.8024193548387096 0.5887096774193549 0.1596774193548387 0.2557603686635945

The first integer number (0) is the object class id. For this dataset, the class id 0 refers to the class “using mask” and the class id 1 refers to the “without mask” class. The following float numbers are the xywh bounding box coordinates. As one can see, these coordinates are normalized to [0, 1[.

We can combine an image and its annotations as follows:

Thus, a dataset — in this context — is basically a bunch of several pairs (image, annotation file) which are used to train and validate object detection models.

Training a model

We can train our model following only 4 steps.

Step 1 — clone and install YOLOv5 and its dependencies

Create a workspace folder and clone YOLOv5 into it:

$ mkdir yolov5_ws 
$ cd yolov5_ws 
$ git clone https://github.com/ultralytics/yolov5 
$ cd yolov5 
$ pip install -r requirements.txt

pc.: Note that maybe you need to upgrade pip before the last command. pipcan be upgraded using the following command: pip install — upgrade pip

After that, pip will install all the required dependencies. If you get an error like:

Could not find a version that satisfies the requirement torch (from versions: none)

It is likely that you have an old python version or maybe an earlier one. In my case, I needed to downgrade Python 3.10 to 3.9 in order to get Pytorch running on my machine.

Step 2 — Getting & preparing the data

Access https://www.kaggle.com/techzizou/labeled-mask-dataset-yolo-darknet and download the Labeled Mask dataset. Uncompress the archive.zip file into a data folder inside yolov5_ws. Checking out the obj folder, we can confirm that the dataset is indeed a bunch of images and respective annotation files:

We need to split this data into two groups: training and validation. About 90% of the images must be copied to the folder yolov5_ws/data/images/training/. The remaining images (10% of the full data) must be saved in the folder yolov5_ws/data/images/validation/.

It is noteworthy that we must be careful to copy the paired annotation files to the respective folders yolov5_ws/data/labels/training/ and yolov5_ws/data/labels/validation/. To avoid any glitches in performing these copies, I recommend to use the following python script:

Save this script with a name of your preference and run it inside the yolov5_ws folder:

$ cd yolov5_ws 
$ python split_data.py

The YOLOv5 training process will use the training subset to actually learn how to detect objects. The validation dataset is used to check the model performance during the training.

Step 3 — preparing the training configuration file

We are almost there! The next step is creating a text file called dataset.yaml inside the folder yolov5_ws with the following content:

train: ../data/images/training/
val: ../data/images/validation/

# number of classes
nc: 2

# class names
names: ['with mask', 'without mask']

This is the final folder structure & files for the train:

Step 4 — Running the train

Now we are all set, it is time to actually run the train:

$ python train.py --img 640 --batch 16 --epochs 5 --data dataset.yaml --weights yolov5s.pt

Depending on your hardware, this training can take longer or only a few minutes. During the training, the process outputs something like:

0/9        0G    0.1184    0.0347   0.03127        47       640:   4%|▎         | 3/85 [01:08<30:00, 21.95s/it]

Finally, in the end, we have the following output:

10 epochs completed in 1.719 hours.
Optimizer stripped from runs/train/exp/weights/last.pt, 14.4MB
Optimizer stripped from runs/train/exp/weights/best.pt, 14.4MB

Validating runs/train/exp/weights/best.pt...
Fusing layers... 
Model Summary: 213 layers, 7015519 parameters, 0 gradients, 15.8 GFLOPs
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|██████████| 5/5 [00:54<00:00, 10.90s/it]                                                                     
                 all        151        283      0.973      0.847       0.95      0.606
          using mask        151        218      0.982      0.862      0.968      0.599
        without mask        151         65      0.964      0.831      0.932      0.613
Results saved to runs/train/exp

Now, confirm that you have a yolov5_ws/yolov5/runs/train/exp/weights/best.pt file:

You did it! best.pt is the model resulting of the training process. Now, the model is ready to make predictions!

Using the trained model

Now we have our model trained with the Labeled Mask dataset, it is time to get some predictions. This can be easily done using an out-of-the-box YOLOv5 script specially designed for this:

python detect.py --weights runs/train/exp/weights/best.pt --img 640 --conf 0.4 --source ../../downloads/mask-teens.jpg

Before using best.pt in production, it is necessary to evaluate its performance. Model evaluation and selection is one of most important topics in the design of machine learning models. However, cover this subject with the proper details is beyond the scope of this story.

In meanwhile, we can check the output of runs/train/exp/results.png which demonstrates the model performance indicators during the training:

In particular, we should take attention for the val/ indicators which demonstrate the model performance on the validation set. This allows us identify modeling problems such as overfittingand underfitting.

Bonus: converting the model to a different format

This command converts best.pt for torchscript and onnx formats generating two new files: best.torchscript and best.onnx

$ python3 export.py --weights yolov5n.pt --img 640 --include torchscript onnx

You can check out the details in the github repo description.

Now, you can use the converted files with libraries which do not recognizes the pytorch model natively.

Conclusion

Few years ago, training computer vision models were an extremely daunting task carried on only by prodigies able to dig deeply inside the complex dynamics of weight updates. Nowadays, engines like YOLOv5 break this barrier providing a high-performing but also easy-to-use set of tools.