YOLOv10 Custom Object Detection

5 min readMay 26, 2024

Overview of YOLOv10 and Training a Model with Custom Data

Overview

YOLOv10, developed using the Ultralytics Python package by Tsinghua University researchers, offers a novel approach to real-time object detection by improving model architecture and eliminating non-maximum suppression (NMS). These optimizations lead to state-of-the-art performance with lower computational demands. Extensive experiments show that YOLOv10 provides superior accuracy-latency trade-offs across various model scales.

As those who have read my previous articles know, I have shared various projects using YOLO models because, among pre-trained models, YOLO models stand out significantly in terms of performance and efficiency compared to other models. However, real-time object detection has faced challenges due to reliance on non-maximum suppression (NMS) and architectural inefficiencies. YOLOv10 addresses these issues by eliminating NMS and adopting a design strategy focused on both efficiency and accuracy.

Architecture

Backbone: Responsible for feature extraction, the backbone in YOLOv10 uses an enhanced version of CSPNet (Cross Stage Partial Network) to improve gradient flow and reduce computational redundancy.
Neck: The neck is designed to aggregate features from different scales and passes them to the head. It includes PAN (Path Aggregation Network) layers for effective multiscale feature fusion.
One-to-Many Head: Generates multiple predictions per object during training to provide rich supervisory signals and improve learning accuracy.
One-to-One Head: Generates a single best prediction per object during inference to eliminate the need for NMS, thereby reducing latency and improving efficiency.

Model Variants and Performance

YOLOv10 comes in six models :

YOLOv10-N: Nano version for extremely resource-constrained environments.
YOLOv10-S: Small version balancing speed and accuracy.
YOLOv10-M: Medium version for general-purpose use.
YOLOv10-B: Balanced version with increased width for higher accuracy.
YOLOv10-L: Large version for higher accuracy at the cost of increased computational resources.
YOLOv10-X: Extra-large version for maximum accuracy and perf

Comparisons

Let’s look at the comparisons between different models regarding latency and accuracy, tested on standard benchmarks like COCO.

It is clear that YOLOv10 is the cutting-edge technology for real-time object detection applications, offering higher accuracy and speed performance with fewer parameters.

Training YOLOv10 for Custom Object Detection

First, clone the official YOLOv10 GitHub repository to download the necessary yolov10n model.

!pip install -q git+https://github.com/THU-MIG/yolov10.git

!wget -P -q https://github.com/jameslahm/yolov10/releases/download/v1.0/yolov10n.pt

You can experiment with any custom project on Roboflow Universe, create your own datasets, and even use RF100 datasets sponsored by Intel. For this article, I will use a pre-prepared dataset designed to detect dangerous items in X-ray images.

Download your model in YOLOv8 format using the Roboflow API.

!pip install -q roboflow
from roboflow import Roboflow
rf = Roboflow(api_key="your-api-key")
project = rf.workspace("vladutc").project("x-ray-baggage")
version = project.version(3)
dataset = version.download("yolov8")

Specify the parameters and file paths, then start the model training.

!yolo task=detect mode=train epochs=25 batch=32 plots=True \
model='/content/-q/yolov10n.pt' \
data='/content/X-Ray-Baggage-3/data.yaml'

Example data.yaml file

names:
- Gun
- Knife
- Pliers
- Scissors
- Wrench

nc: 5

roboflow:
  license: CC BY 4.0
  project: x-ray-baggage
  url: https://universe.roboflow.com/vladutc/x-ray-baggage/dataset/3
  version: 3
  workspace: vladutc

test: /content/X-Ray-Baggage-3/test/images
train: /content/X-Ray-Baggage-3/train/images
val: /content/X-Ray-Baggage-3/valid/images

Let’s look at the results.

Image(filename='/content/runs/detect/train/results.png', width=1000)

Let’s predict the test data and display the results in a 5x2 grid.

from ultralytics import YOLOv10

model_path = '/content/runs/detect/train/weights/best.pt'
model = YOLOv10(model_path)
results = model(source='/content/X-Ray-Baggage-3/test/images', conf=0.25,save=True)

import glob
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

images = glob.glob('/content/runs/detect/predict/*.jpg')

images_to_display = images[:10]

fig, axes = plt.subplots(2, 5, figsize=(20, 10))

for i, ax in enumerate(axes.flat):
    if i < len(images_to_display):
        img = mpimg.imread(images_to_display[i])
        ax.imshow(img)
        ax.axis('off')  
    else:
        ax.axis('off')  

plt.tight_layout()
plt.show()

Conclusion and Recommendations

While creating this article, I trained the YOLOv10n model on multiple datasets, which exhausted my 15GB free T4 GPU limit on Colab. When you exceed the limits while training a model in the Colab environment, there are limitations for the T4 GPU. You can potentially address this issue by logging in with different Google accounts.
Since technology is advancing rapidly, I believe it is beneficial to learn the main concepts without getting stuck on a single technology in both computer vision and large language models. To adapt to this, it is helpful to learn from the developers of these technologies. The content from Ultralytics and Roboflow is very valuable in this area, and it is worthwhile to follow them.

References

Official Repo: https://github.com/THU-MIG/yolov10
Ultralytics
Roboflow

@article{THU-MIGyolov10,
  title={YOLOv10: Real-Time End-to-End Object Detection},
  author={Ao Wang, Hui Chen, Lihao Liu, et al.},
  journal={arXiv preprint arXiv:2405.14458},
  year={2024},
  institution={Tsinghua University},
  license = {AGPL-3.0}
}

@misc{
x-ray-baggage_dataset,
title = { X-Ray Baggage Dataset },
type = { Open Source Dataset },
author = { vladutc },
howpublished = { \url{ https://universe.roboflow.com/vladutc/x-ray-baggage } },
url = { https://universe.roboflow.com/vladutc/x-ray-baggage },
journal = { Roboflow Universe },
publisher = { Roboflow },
year = { 2024 },
month = { may },
note = { visited on 2024-05-26 },
}

I would like to thank the researchers from Tsinghua University, the teams at Ultralytics and Roboflow, as well as all the contributors to the open-source community.