Computer Vision Blogs — Object Detection with Yolov8: Train your own custom object detection model

Mohit Gupta
8 min readAug 11, 2023

--

If you’ve found your way here, chances are you might be struggling with the intricacies of object detection. I promise you that by the end of this blog, you will be absolutely ready to train a Yolov8 model for your custom dataset. So, let’s get started.

Where should I get Yolo from?

You can get Yolo from Ultralytics. Ultralytics is making our lives easier by offering a straightforward method to leverage this neural network. Execute this line in your (Anaconda) command prompt:

pip install ultralytics

Now you have YOLO on your system. We will see in some time how to access it, but before we proceed, let’s prepare the data that will be fed to the YOLO model.

How to prepare the data?

I assume that you have your own image dataset for training an object detection model. In my case, I am using a dataset of car dashboard photos, as depicted in Figure 1. As a crucial preprocessing step, ensure that all your images have the same shape (this is very important).

Figure1. Images of Car dashboards

In this case, my focus is on detecting the Odometer, which provides the total distance traveled by the car. This problem is highly relevant for insurance companies who want to speed up car inspection. Automating the odometer reading process through image analysis, as opposed to manual entry, has the potential to substantially reduce time consumption. I suggest you too start to think about a business problem for your object detection dataset.

To tell the computer aka Yolo model to find us the odometers, we will have to label the images. In the context of an object detection dataset, labeling means drawing rectangular Bounding Boxes around the object we want our model to detect as shown below:

Figure2. Drawing Bounding Box around the object we want to detect

This process of drawing a Bounding Box is known as labeling/ data annotation. You can use this tool — Labellmg to label your dataset. From this link, download the tool binaries, extract the contents of the zip file, find Labellmg.exe, and launch it to start the annotation process. After completing the labeling, you will have both your annotated images and corresponding .txt files (annotations). The name of the .txt file would be the same as your image filename (This will be done automatically by Labellmg). For 1 bounding box, there will be 5 values in the .txt file like this:

0 0.533399 0.638112 0.060904 0.045455
Figure3. A Visual Guide to Bounding Box Parameters (Sample Image)

These values represent the following things. (See Figure 3 for reference)

First number — 0 — represents Class: Odometer in my case. (Suppose, I had another class, say speedometer then there will be one more row in the .txt file which would start with 1)

Second number — 0.533399 — represents the center x-coordinate of the Bounding Box as a fraction of the image width (i.e. 682/1280).

Third number — 0.638112— represents the center y-coordinate of the Bounding Box as a fraction of image height (i.e. 459/720).

Fourth number — 0.060904 — represents the width of the bounding box as a fraction of the image width(i.e. 82/1280).

Fifth number — 0.045455— represents the height of the bounding box as a fraction of image height (i.e. 32/720).

Looking directly at these numbers may not make sense to you but they hold significance for computers. After all, we communicate with computers via numbers only. So, when we give this image and its accompanying annotation file to Yolo, the model’s attention will be brought to the Odometer. After you have labeled the entire dataset, your directory should look like what is shown in Figure 4, where for every image there is a corresponding annotation file.

Figure4. Image files and corresponding annotation files

Now, we have a) Yolo on our PC, and b) Labelled dataset to train it. Do you think we can now start training?

Answer- No. If we want to make an object detection model, we want to make sure it is performant. And, how do we ensure it? By testing it first internally and then externally. So, for our internal testing, we will split our dataset into 2 parts: 1st part to train and 2nd part to test it (this is called the validation set which helps in tracking the performance).

We usually keep 20–30% for our validation set. IMPORTANT: While splitting the dataset into train and validation datasets, maintain the directory structure as depicted in Figure 5, where you first create 2 folders namely images and labels in the main directory. Then create 2 subfolders namely train and val in each of these 2 folders. Put your train images and validation images in respectively named folders in the images folder and do the same thing with the labels folder. Why we did do this ?— Let’s just say that Yolo likes data in this format :) (Actually, it is easier to adapt your data according to the algorithm, rather than the other way around.) You can write a custom script to do all of this or use split-dataset for this. (Again!! make sure all images are of the same shape.)

Figure5. Folder structure for the data

Now we have a) Yolo on our PC b) Labelled dataset c) Validation dataset to monitor performance. Do you think we can now start training?

Answer — No. We are almost done, just one last thing: creating a .yaml file for the dataset. Why? — Yolo itself does not know where our train images are and where our validation images are. Therefore, we convey this vital information to Yolo by means of the .yaml file. The content of this file gives the location of our train and val folders along with class information and it looks something like this:

train: C:/Users/car_dataset/images/train
val: C:/Users/car_dataset/images/val
nc: 1
names: ['odometer']

You can see here that we have only given paths to train and val images. The Yolo algorithm is designed to intelligently locate the labels that correspond to these images. Hence, it is important that when you create train and val folders (as shown in Figure 5), their names should be identical within both images and labels folders. You can either download this .yaml file from my github and open it in a software like Notepad++ to modify it according to your dataset or you can craft your own using this code snippet.

data = {'train': 'C:/Users/car_dataset/images/train', 
'val': 'C:/Users/car_dataset/images/val',
'nc': 1
'names': '''['odometer']'''
}

# Write odometer_dataset.yaml file
with open('odometer_dataset.yaml', 'w') as file:
yaml.dump(data, file)

nc — number of classes. If you have 2, make it 2.

names — name of classes. If you label odometer as 0 (i.e. 1st class) and speedometer as 1 (i.e. 2nd class), then write : [‘odometer’, ‘speedometer’]

Now let’s open Jupyter Notebook to train our model!!

# 1. Import necessary libraries
from ultralytics import YOLO # Here we import YOLO
import yaml # for yaml files
import torch
from PIL import Image
import os
import cv2
import time

# 2. Choose our yaml file
yaml_filename = 'odometer_dataset.yaml'

# 3. Create Yolo model
model = YOLO('yolov8n.yaml') # creates Yolo object from 'yolov8n.yaml' configuration file.
model = YOLO('yolov8n.pt') # Loads pretrained weights
model = YOLO('yolov8n.yaml').load('yolov8n.pt') # build from YAML and transfer weights

# 4. Train the model
model.train(data='{}'.format(yaml_filename), epochs=30,patience=5, batch=16, imgsz=640)

That’s it. Your model will start training. Don’t worry about ‘yolov8n.yaml’ and ‘yolov8n.pt’ as these files will get downloaded automatically once you run the above code.

Another important thing to note is imgsz: which denotes the image size: larger of the image height or width. YOLO can work on rectangular images. If your image is 460x460, you can write 460. But if your image is 1920x1088, then you don’t have to resize it to 460x460, instead, you can write 1920 as imgsz. Keep in mind that large images mean smaller batch sizes, and longer training time but potentially more accuracy.

If you’re curious about the ‘n’ in ‘yolov8n’, it means ‘nano’ which is rhe smallest variant of Yolov8 with least number of parameters. Yolo architecture is available in various sizes like s-small, m-medium, l-large, and x-extra large. Some of them are shown below in Figure 6. Depending on factors such as computation load, speed, and accuracy, individuals can select the model that aligns best with their requirements.

Figure6. Variants of Yolo

Once, the model is trained, you can load the newly trained weights which by default get stored in the ‘runs’ folder. This folder is created by Yolo automatically and will have weights in the ‘runs/train/weights’ path. Best weights would be saved by the name ‘best.pt’. To use this model for new images, load these weights and pass the new images through this model to get the predictions.

# 5. Load the trained weights
model = YOLO('runs/train/weights/best.pt')

# 6. to run prediction on 1 image at a time
im = Image.open('new_car_images/car99.jpg')
results = model.predict(source=im, save=True)

# 7. to run prediction on all images in a folder
im_dir = 'new_car_images/'
results = model.predict(source=im_dir, save=True)

You can choose #6 if you want to run on a single image else, choose #7 if you want to feed all images of a folder in one go. ‘save = True’ will save the prediction results in the runs folder.

I hope this blog is comprehensive enough to enable you to train your model. However, you can customize it further by changing hyperparameters or you can even play around with different variants of Yolo. To do this, I suggest you visit this website at Ultralytics.

Note: This blog is not about introducing Yolo architecture but quickly adapting it to our dataset. If you have read it, I am sure you will be able to train your model. In case anyone needs it, I have also made all the essential files available on my GitHub here.

Just like in most business use cases of computer vision, object detection solves a part of a much bigger problem. I started with the business goal of having automated reading of odometers from images. Until now, we have detected the odometer, but to read the digits in the odometer, we will be finetuning an Encoder-Decoder Transformer. We will do this in my next blog. Meanwhile, can you guess why have I chosen to use a Transformer? Why not use standard OCR libraries like Tesseract?

Stay tuned for my next blog. I will answer these questions and achieve my business goal for this exercise. Have fun!

References:

  1. Labellmg: https://github.com/HumanSignal/labelImg/releases
  2. https://docs.ultralytics.com/usage/python/
  3. https://docs.ultralytics.com/modes/predict/
  4. https://docs.ultralytics.com/modes/train/

--

--