Complete Step-by-Step Guide to Build a Custom Object Detection Model with YOLOv5 — Part 2

Published in

Classifai

6 min readMay 31, 2021

In the last part, the preparation of a custom dataset for the object detection model has been discussed. In this part, we will be focusing on model training, evaluation and inference.

Codebase

The scripts for this project is in the Github repository. It is completely replicable, so make sure you are following along!
How to clone a repository

Clone the repository

git clone https://github.com/CertifaiAI/classifai-blogs.git

The codebase will be in the folder 0_Complete_Guide_To_Custom_Object_Detection_Model_With_Yolov5

Pre-requisites for this section

Environment Setup

The environment setup to run the script has been discussed in the last part. Feel free to read it for a more detailed explanation or go to the Github repository to have a quick setup.

Dataset

You can get the dataset from this link. Dataset preparation has been discussed in the last part as well. If you want to know the details on custom dataset preparation, feel free to read it.

Model Training

The model used in the script provided was leveraged from the Github repository: https://github.com/ultralytics/yolov5

It is built to train in Google Colab. Certain changes to the script are expected if training is performed locally. With the free GPU allocated in Google Colab, the model can be trained much faster.

Setting Up Google Colab

Zip the dataset folder, rename it into dataset.zip.

2. Upload YOLOv5_PyTorch.ipynb Jupyter notebook script to your personal Google Drive.

3. In Google Drive, double click the YOLOv5_PyTorch.ipynb file to open Google Colab session.

4. Go to files, upload the dataset.zip.

5. Make sure you are running on “GPU” on runtime. You may follow the guide provided.

6. Now we are ready for model training.

Training

All the training is done in the notebook.

Unzip the dataset.zip.

2. Clone the YOLOv5 repository and install all the dependencies to Google Colab.

3. Visualize the data.yaml to make sure the number of classes is correct. Verify the name too because they will not be modifiable after model training.

4. Select a suitable YOLOv5 model. There are a couple of variations of the models as shown in the diagram below:

YOLOv5s is the fastest model, whereas YOLOv5x has the highest mean average precision.

In this example, YOLOv5s is chosen for its computational speed. For more details on these models please check out the YOLOv5 models.

5. Define training arguments:

img: Define input image size
batch: Specify batch size
epochs: Define the number of training epochs (typically we will train for more than 100 epochs)
data: Set the path to your yaml file
weights: Specify a custom path to weights.
options: yolov5s.pt, yolov5m.pt, yolov5l.pt and yolov5x.pt
name: Name of the training result folder
cache: Cache images for faster training

In this project, the training arguments will be:

--img 416 
--batch 16 
--epochs 100 
--data '../data.yaml' 
--weights yolov5s.pt 
--name yolov5s_results 
--cache

6. Run the training script. Training of the model will take some time. Using the configuration below, the training takes approximately 20 minutes.

!python train.py --img 416 --batch 16 --epochs 100 --data '../data.yaml' --weights yolov5s.pt --name yolov5s_results --cache

Model Evaluation

There are two styles to visualize the evaluation metrics and training losses of the model training, namely:

Tensorboard

2. Manual Plotting

Evaluation

Model Evaluation is an integral part of the model development process. It helps to find the best model that represents our data and how well the chosen model will work in the future.

Loss Functions

Box: Evaluate the area of intersection of the predicted bounding box to the ground truth bounding box. This is also called IoU.
Objectness: Evaluate if a bounding box predicted contains an object.
Classification: Evaluate if the class predicted is right or wrong.

Evaluation Metrics

Precision: Evaluate how many of the bounding boxes predicted were correct.
Recall: Evaluate how many correct bounding boxes were predicted.
mAP: Evaluate how correct are the bounding box predictions on average.

From the evaluation metrics, we can see that the model converges over time for all the losses in training and validation. It achieves 0.9 mAP, 0.9 for precision and 0.8 for recall.

Inference

In the machine learning context, the inference is a process of using a trained model to perform prediction on unseen data.

Run Inference on Google Colab

After training, the model weights are automatically saved. Inference can be directly run on Google Colab using the test dataset.

Predict the test dataset with the trained model.

2. Image visualization.

Run Inference on Local

Download the weights of the trained model. It will be located in the browser’s default download location.

2. Go to the folder 0_Complete_Guide_To_Custom_Object_Detection_Model_With_Yolov5/ModelTraining. Copy the downloaded weights to ./src/weights folder.

3. Select the source of data.
Multiple sources of data can be run for inference such as webcam, image, video, folder, a glob of images, URL and also streaming protocol. It can be configured using the --source argument parsing in the next step.

python ./src/detect.py --source 0  # webcam
                                file.jpg  # image 
                                file.mp4  # video
                                path/  # directory
                                path/*.jpg  # glob
                                'https://youtu.be/NUsoVlDFqZg'  # YouTube video
                                'rtsp://example.com/media.mp4'  # RTSP, RTMP, HTTP stream

4. In the ModelTrainingfolder, run inference with scripts.

python ./src/detect.py --source <<source>> --weights <<weights name>> --conf <<threshold>>

Eg.

python ./src/detect.py --source img_0007.png --weights ./src/weights/best.pt --conf 0.5

5. After running the inference script, the output folder to save output data will be stated in the terminal.
In the example below, the output data is saved in ./runs/detect/exp.

Sample Output

Conclusion

Although YOLOv5s is having the lowest mAP with the coco dataset, it performs well with our custom dataset.

Going through the whole workflow, you may realize that the bottleneck of a machine learning project is in getting a dataset with good quality because a dataset that is poorly labelled will result in a poor performing model.

Therefore, the essence of circumventing this mess would be:

Make use of web scraping to acquire a plentiful amount of data easily
Utilize comprehensive open-source data annotation tools such as ClassifAI to streamline the procedures of data labelling
Assure the quality of data labels is consistent throughout the annotation process by implementing cross-reviews of labels

Training a model with bad data is like making a dish out of rotten ingredients. Be mindful of your data, as it can be your friend who accelerates the model convergence or an enemy who burns the bridge between you and a good model.