YOLOv10: A Step-by-Step Guide to Object Detection on a Custom Dataset
Overview
Computer vision is a fascinating field that involves teaching machines to understand and interpret the visual world. This includes tasks like detection, classification, and segmentation of objects in images and videos. Object detection, in particular, is a crucial task where the goal is to identify and locate objects within an image or video.
There are several frameworks used for object detection, including YOLO (You Only Look Once), SSD (Single Shot MultiBox Detector), and Faster R-CNN. Among these, YOLO stands out due to its impressive speed and accuracy. The evolution of YOLO from its first version (YOLOv1) to the latest (YOLOv10) has brought significant improvements. YOLOv10 is especially known for its high performance in terms of both speed and accuracy.
In this guide, we will walk you through the process of training a YOLOv10 model on a custom dataset. If you encounter any difficulties or feel lost at any step, don’t worry. I’ve created a streamlined pipeline to simplify the entire process. You can easily follow along and proceed directly to the model training step with confidence.
For more detailed information, you can refer to the original research papers and the main YOLOv10 repository:
[paper](https://arxiv.org/abs/2405.14458)
Main YOLOv10 — [repository](https://github.com/THU-MIG/yolov10)
Step-by-Step Guide
Step 1: Data Collection
First, you need to gather a dataset that includes the images and the corresponding annotations for the objects you want to detect. This is one of the most important steps in training your YOLOv10 model. Although it might seem tedious and time-consuming, having a high-quality dataset is essential for achieving good performance.
Here are some commonly used tools for creating annotations:
- LabelIMG: An easy-to-use graphical image annotation tool that generates XML files in PASCAL VOC format.
- Labelme: Another popular annotation tool that outputs JSON files compatible with various frameworks.
If creating your own dataset sounds too daunting, there are several open-source datasets available that come with pre-made annotations. Here are a few sources you can explore:
- Kaggle: Kaggle hosts a variety of datasets across different domains, including image datasets with annotations. You can find datasets for specific tasks like object detection, image classification, and more.
- Roboflow: Roboflow provides a wide range of annotated datasets specifically for computer vision tasks. They also offer tools to help preprocess and augment your data.
- COCO (Common Objects in Context): The COCO dataset is a large-scale object detection, segmentation, and captioning dataset. It’s widely used in computer vision research and comes with detailed annotations.
- Pascal VOC: The Pascal VOC dataset is another well-known dataset for object detection and segmentation. It includes annotations in the PASCAL VOC format, which is compatible with many annotation tools.
By using these tools and resources, you can efficiently gather and annotate a robust dataset that will help your YOLOv10 model perform well in detecting objects.
Step 2: Preprocessing
Before training, it’s essential to preprocess your data. This process involves resizing images, normalizing pixel values, and converting annotations into the format required by YOLOv10. Proper preprocessing is crucial as it ensures the model can learn effectively from the data. Additionally, researching more about data augmentation techniques and potential issues with image preprocessing can be beneficial.
While the YOLO framework includes many preprocessing steps during training, these can be handled automatically, especially if you are new to this process. However, understanding and manually performing these steps can give you better control over the quality and variability of your dataset, ultimately leading to a more robust model.
Step 4: Converting Annotations to YOLO Format
Converting your annotations to the YOLO format is a crucial step before training your custom dataset with YOLOv10. YOLO requires annotations to be in a specific format, where each object is represented by a single line in a text file corresponding to each image. Additionally, you need ‘name .yaml’ file to define such dataset path, classes index corresponding to class name.
The YOLO annotation format is:
<object-class> <x_center> <y_center> <width> <height>
- object-class: Integer representing the object class (starting from 0).
- x_center: Normalized x coordinate of the center of the bounding box (value between 0 and 1).
- y_center: Normalized y coordinate of the center of the bounding box (value between 0 and 1).
- width: Normalized width of the bounding box (value between 0 and 1).
- height: Normalized height of the bounding box (value between 0 and 1).
Here is a step-by-step guide to convert annotations to YOLO format:
- Read Annotations: Parse your existing annotation files.
- Normalize Coordinates: Convert the bounding box coordinates to the YOLO format.
- Save Annotations: Write the converted annotations to new ‘.text’ files.
Example Code for Conversion:
import json
import os
def convert_to_yolo(size, box):
dw = 1. / size[0]
dh = 1. / size[1]
x = (box[0] + box[1]) / 2.0 - 1
y = (box[2] + box[3]) / 2.0 - 1
w = box[1] - box[0]
h = box[3] - box[2]
x = x * dw
w = w * dw
y = y * dh
h = h * dh
return (x, y, w, h)
def convert_annotations(json_file, output_dir):
with open(json_file) as f:
data = json.load(f)
image_width = data['imageWidth']
image_height = data['imageHeight']
objects = data['shapes']
yolo_annotations = []
for obj in objects:
class_id = obj['label'] # Assuming label is already an integer class ID
box = obj['points'][0] + obj['points'][1] # [xmin, ymin, xmax, ymax]
yolo_box = convert_to_yolo((image_width, image_height), box)
yolo_annotations.append(f"{class_id} {' '.join(map(str, yolo_box))}\n")
output_file = os.path.join(output_dir, os.path.basename(json_file).replace('.json', '.txt'))
with open(output_file, 'w') as out_f:
out_f.writelines(yolo_annotations)
# Example usage
json_annotations_dir = 'path/to/json/annotations'
output_dir = 'path/to/yolo/annotations'
os.makedirs(output_dir, exist_ok=True)
for json_file in os.listdir(json_annotations_dir):
if json_file.endswith('.json'):
convert_annotations(os.path.join(json_annotations_dir, json_file), output_dir)
Step 3: Model Training
Absolutely, this is a crucial part of our process.
I apologize if it seems complicated, but this part is designed to simplify the steps ahead. If you have completed all the previous steps, you’re now ready to proceed to the training phase. Just follow the steps below:
- Ensure that you have cloned the YOLOv10 repository to your local machine, or you can install it using the ‘ultralytics’ package as shown below:
pip3 install ultralytics
It is recommended to create a virtual environment in Python and install YOLOv10 within it. This helps manage dependencies and avoid conflicts with other projects.
python -m venv yolov10-env
source yolov10-env/bin/activate # On Windows, use yolov10-env\Scripts\activate
pip install ultralytics
# To install requirements
pip install -r requirements.txt
## Requirements.txt should include following packages
torch==2.0.1
torchvision==0.15.2
onnx==1.14.0
onnxruntime==1.15.1
pycocotools==2.0.7
PyYAML==6.0.1
scipy==1.13.0
onnxsim==0.4.36
onnxruntime-gpu==1.18.0
gradio==4.31.5
opencv-python==4.9.0.80
psutil==5.9.8
py-cpuinfo==9.0.0
huggingface-hub==0.23.2
safetensors==0.4.3
2. Keep your converted annotation files, corresponding images, and the .yaml
file in a designated folder. Ensure that the .yaml
file correctly defines the dataset paths and classes. This configuration file is crucial for guiding the model during the training process.
Example structure of the .yaml
file:
train: /path/to/train/images
val: /path/to/val/images
nc: number_of_classes
names: ['class1', 'class2', 'class3', ...]
- After completing all the setup steps, you can start the training process as follows:
To run training from the cloned folder, use the command:
python train.py --data /path/to/your/data.yaml --cfg /path/to/yolov10.yaml --weights '' --name yolov10_customTo run from ultralytics local API:
To run training from yolo local api:
from ultralytics import YOLOv10
model = YOLOv10()
# If you want to finetune the model with pretrained weights, you could load the
# pretrained weights like below
# model = YOLOv10.from_pretrained('jameslahm/yolov10{n/s/m/b/l/x}')
# or
# wget https://github.com/THU-MIG/yolov10/releases/download/v1.1/yolov10{n/s/m/b/l/x}.pt
# model = YOLOv10('yolov10{n/s/m/b/l/x}.pt')
# Choose which model you want to use. remember larger the model higher the computational cost
model = YOLOv10(yolo10n.pt)
model.train(data='coco.yaml', epochs=500, batch=4, imgsz=640)
Simplified Training Setup for Custom Dataset
I’ve streamlined the process to make training on a custom dataset as easy as possible. Just follow these steps:
- Create a Folder:
- Create a folder named
YoloV10-train
.
2. Install Required Libraries:
- Ensure you have installed the necessary libraries and the
ultralytics
API.
pip install ultralytics
3. Organize Your Data:
- Inside
YoloV10-train
, create a folder namedraw-data
. - Within
raw-data
, create subfolders namedtrain
,test
, andvalid
, each containing further subfoldersimages
andxmls
.
4. Download Scripts:
- Download
main.py
,train.py
, and thesrc
folder from this repository.
5. Download Pre-trained Model:
- Download the required pre-trained model from this link and place it in the
YoloV10-train
folder.
Your folder structure should look like this:
YoloV10-train/
├── raw-data/
│ ├── train/
│ │ ├── images/
│ │ └── xmls/
│ ├── test/
│ │ ├── images/
│ │ └── xmls/
│ └── valid/
│ ├── images/
│ └── xmls/
├── main.py
├── train.py
├── src/
└── pre-trained-model.pt
6) now just run in your terminal
YoloV10-train:~$ python3 main.py
Step 4: Model inference
To predict new images run following script:
from ultralytics import YOLO
# Load a pre-trained YOLOv10n model
model = YOLO("yolov10n.pt") # Your model should be here after training
# Perform object detection on an image
results = model("image.jpg") # image you weant to predict on
# Display the results
results[0].show()
References: