YOLOv9 for Object Detection

Published in

Data Reply IT | DataTech

7 min read4 days ago

What is YOLO?

YOLO, an acronym for “You Only Look Once,” is a deep learning model used for real-time object detection. Introduced by Joseph Redmon in 2016, YOLO revolutionized the approach to object detection with its innovative architecture that allows for high performance in terms of both speed and accuracy.

Unlike traditional approaches that divide the object detection process into several phases (such as region proposal, classification, and localization), YOLO adopts a unified method. The model considers the entire image in a single pass through the neural network, dividing the image into a grid and simultaneously predicting bounding boxes and classes for each grid cell.

Over the years, YOLO has continued to evolve and improve, with each new version introducing significant advancements in performance and accuracy. From the first version, YOLOv1, it has progressed to the latest versions, YOLOv8, YOLOv9, and the recent YOLOv10.

What’s New in YOLOv9?

YOLOv9 represents a significant advancement in real-time object detection, introducing innovative techniques such as Programmable Gradient Information (PGI) and Generalized Efficient Layer Aggregation Network (GELAN). This model demonstrates remarkable improvements in efficiency, accuracy, and adaptability, establishing new benchmarks on the MS COCO dataset. Not only does it enhance the model’s learning capability, but it also ensures preservation of crucial information throughout the process, thereby achieving high precision and performance. Although developed by a separate open-source team, YOLOv9 builds upon the robust codebase provided by Ultralytics YOLOv5.

Information Bottleneck Principle

The principle of the information bottleneck reveals a fundamental challenge in deep learning: as data passes through successive layers of a network, there is an increased risk of information loss.

YOLOv9 addresses this challenge by implementing Programmable Gradient Information (PGI), which helps preserve essential data throughout the network depth, ensuring more reliable gradient generation and consequently improving model convergence and performance.

Addressing information loss is particularly important for lightweight models, which are often underparameterized and prone to losing significant information during the feedforward process. YOLOv9’s architecture, leveraging PGI and reversible functions, ensures that even with a simplified model, essential information required for accurate object detection is preserved and effectively utilized.

Reversible Functions

The concept of reversible functions is another cornerstone of YOLOv9’s design. A function is considered reversible if it can be inverted without any loss of information.

This property is crucial for deep learning architectures as it allows the network to maintain a complete flow of information, thereby enabling more accurate updates to model parameters. YOLOv9 incorporates reversible functions within its architecture to mitigate the risk of information degradation, especially in deeper layers, ensuring the preservation of critical data for object detection tasks.

Generalized Efficient Layer Aggregation Network (GELAN)

GELAN represents a strategic architectural advancement, enabling YOLOv9 to achieve superior parameter utilization and computational efficiency. Its design allows flexible integration of various computational blocks, making YOLOv9 adaptable to a wide range of applications without sacrificing speed or accuracy.

Feature maps (visualization results) produced by random initial weights of PlainNet, ResNet, CSPNet, and GELAN at different depths. After 100 layers, ResNet starts producing feedforward output that is sufficient to obscure object information. GELAN can still maintain sufficiently complete information up to the 150th layer and remains sufficiently discriminative up to the 200th layer.

Performance on MS COCO Dataset

The performance of YOLOv9 on the COCO dataset exemplifies its significant advancements in real-time object detection, setting new benchmarks across various model formats. The table provides a comprehensive comparison among top real-time object detectors, showcasing YOLOv9’s superior efficiency and accuracy.

Iterations of YOLOv9, ranging from the smallest variant, t, to the most extensive model, e, demonstrate improvements not only in accuracy (mAP metrics) but also in efficiency, with reduced parameter counts and computational requirements (FLOPs). This table underscores YOLOv9’s ability to deliver high precision while maintaining or reducing computational load compared to previous versions and competing models.

In comparison, YOLOv9 shows significant gains:

Lightweight Models: YOLOv9s outperforms YOLO MS-S in parameter efficiency and computational load, achieving a 0.4∼0.6% improvement in AP.
Medium to Large Models: YOLOv9m and YOLOv9e exhibit substantial advancements in balancing the trade-off between model complexity and detection performance, offering significant reductions in parameters and computations alongside improved accuracy.

Notably, YOLOv9c highlights the effectiveness of architectural optimizations. It operates with 42% fewer parameters and 21% less computational demand than YOLOv7 AF, while achieving comparable accuracy, demonstrating significant efficiency improvements of YOLOv9. Furthermore, YOLOv9e sets a new standard for large models, with 15% fewer parameters and 25% less computational demand than YOLOv8x, along with a 1.7% increase in AP.

Comparison of Performance with Predecessor YOLOv8

The following image compares the results achieved by YOLOv8, GELAN, and YOLOv9 models on the MS COCO dataset. This comparison highlights the significant improvements made by YOLOv9 and GELAN in terms of efficiency, accuracy, and reduction in computational load. Through the analysis of key metrics, we can observe how the optimizations introduced in YOLOv9 establish new benchmarks, demonstrating technological advancement over the previous version.

Comparison between YOLOv8, YOLOv9 and GELAN

In addition to the YOLOv8 versus YOLOv9 comparison, the next image provides an overview of YOLOv9’s performance relative to other state-of-the-art object detection models. The graphs illustrate the relationship between the number of parameters and computational load (FLOPs) versus accuracy (AP) on the MS COCO dataset. From this analysis, it is evident how YOLOv9 and GELAN outperform many existing solutions, solidifying YOLOv9’s position as one of the most efficient and accurate models currently available.

Comparison between YOLOv9 and other models

Example of Use for Object Detection

Below is an example of using YOLOv9 to retrain the model on a custom dataset and then perform inference to meet your specific use case. The provided code was implemented on Google Colab using the GPU as runtime.

Step 1: Clone the YOLOv9 Repository

This command clones the YOLOv9 repository from GitHub. git clone is a Git command that creates a local copy of a remote repository, allowing us to explore and modify the YOLOv9 model source code.

!git clone https://github.com/SkalskiP/yolov9.git

Step 2: Configure Project Directories

We are setting paths for project directories. dataDir indicates the directory where CSS data is located, while workingDir is the main directory where we will work, presumably in Google Colab. An example of dataset could be https://www.kaggle.com/datasets/snehilsanyal/construction-site-safety-image-dataset-roboflow?resource=download.

# Path to uncompressed CSS data directory
dataDir = '/content/css-data/' 
# Working directory in Google Colab
workingDir = '/content/'

Step 3: Define Dataset Classes

Define the number of classes in our dataset (num_classes) and the class names (classes). This information is crucial for configuring the model for specific object detection tasks in our dataset.

num_classes = 6
classes = ['Hardhat', 'Mask', 'NO-Hardhat', 'NO-Mask', 'NO-Safety Vest', 'Safety Vest']

Step 4: Create Dataset Configuration File

Create a data.yaml configuration file that describes the paths to training, validation, and test data, number of classes, and class names. Use the yaml library to write this information into a YAML file, which will be used by YOLOv9 during training.

import yaml
import os

file_dict = {
    'train': os.path.join(dataDir, 'train'),
    'val': os.path.join(dataDir, 'valid'),
    'test': os.path.join(dataDir, 'test'),
    'nc': num_classes,
    'names': classes
}

with open(os.path.join(workingDir, 'yolov9', 'data.yaml'), 'w+') as f:
    yaml.dump(file_dict, f)

Step 5: Download YOLOv9 Weights

Use wget to download pre-trained YOLOv9 weights from the release version on GitHub. These weights serve as a starting point for our model, allowing fine-tuning on our custom data without training from scratch.

!wget https://github.com/WongKinYiu/yolov9/releases/download/v0.1/yolov9-e.pt

Step 6: Install Dependencies

Change directory to the yolov9 folder and use pip to install dependencies listed in the requirements.txt file. The -q option performs installation silently, reducing output unless there are errors.

%cd yolov9
!pip install -r requirements.txt -q

Step 7: Train the Model

--workers 8: Number of workers for data loading. A higher number can speed up data preprocessing.
--batch 4: Batch size for training. Determines the number of samples used in each training iteration.
--img 640: Input image size. Images will be resized to 640x640 pixels.
--epochs 50: Number of training epochs. One epoch is a complete pass through the entire training dataset.
--data /content/yolov9/data.yaml: Path to the dataset YAML file created earlier.
--weights /content/yolov9-e.pt: Path to the downloaded pre-trained weights.
--device 0: Use the first available GPU for training.
--cfg /content/yolov9/models/detect/yolov9.yaml: Path to the model configuration file.
--hyp /content/yolov9/data/hyps/hyp.scratch-high.yaml: Path to the hyperparameters file.

Before running this command, ensure to adjust the nc parameter in yolov9.yaml file to reflect the number of classes in our dataset.

!python train_dual.py --workers 8 --batch 4 --img 640 --epochs 50 --data /content/yolov9/data.yaml --weights /content/yolov9-e.pt --device 0 --cfg /content/yolov9/models/detect/yolov9.yaml --hyp /content/yolov9/data/hyps/hyp.scratch-high.yaml

Step 8: Perform Inference

--img 640: Input image size for inference.
--conf 0.1: Confidence threshold for detections.
--device 0: Use the first available GPU for inference.
--weights /content/yolov9/runs/train/exp2/weights/best.pt: Path to trained model weights.
--source /content/css-data/test/images/img1.jpg: Path to the image on which to perform inference. Change img1.jpg to the path of the image you want to make the inference on.

!python detect.py --img 640 --conf 0.1 --device 0 --weights /content/yolov9/runs/train/exp2/weights/best.pt --source /content/css-data/test/images/img1.jpg

Step 9: Display output

Display the output image to verify object detection results:

from IPython.display import Image
Image(filename="/content/yolov9/runs/detect/exp2/img1.jpg")