P12 BLOG 3 — YOLO Image Classification or Object Detection

Published in

AIN311 Fall 2023 Projects

2 min readDec 26, 2023

In our third blog entry, our extensive research has led us to adopt the YOLOv8 classification model. While initially renowned as an object detection model, YOLO, particularly in its eighth iteration, exhibits enhanced capabilities beyond mere object detection — it now excels in image classification.

The YOLOv8 classification model is meticulously engineered to accomplish real-time detection of 1000 predetermined classes within images. Unlike other tasks that rely on pre-training models using datasets such as COCO or ImageNet, the distinctive feature of image classification lies in its focus on categorizing entire images into predefined labels, as opposed to delineating bounding boxes around identified classes within the image. This attribute proves beneficial when the primary objective is discerning the class to which an image belongs, without necessitating precise object localization or delineation of their specific shapes.

Notably, YOLOv8 comprises various models of differing sizes, a topic that warrants exploration in our forthcoming blog entry. Specifically, the next post will delve into a comparative analysis between the classic YOLO architecture and the architecture tailored for classification purposes. This exploration will elucidate the disparities and advancements intrinsic to each model, shedding light on their respective strengths and functionalities.

The architecture presented here is the YOLOv8 framework incorporating the CSPDarknet backbone. Specifically, the model denoted as YOLOv8N-CLS.pt follows a parallel structure with the YOLOv8 architecture up to its 9th layer, which in the original framework is denoted as SPPF. However, in YOLOv8N-CLS.pt, a modification has been introduced at this layer, deviating from the SPPF utilization.

The altered layer configuration is delineated as follows:

(9): Classify(
(conv): Conv(
(conv): Conv2d(256, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(1280, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): SiLU(inplace=True)
)
(pool): AdaptiveAvgPool2d(output_size=1)
(drop): Dropout(p=0.0, inplace=True)
(linear): Linear(in_features=1280, out_features=1000, bias=True)
)

This excerpt illustrates the terminal layers of the nano model, representing its fundamental and streamlined architecture. The primary function of this particular layer arrangement is geared toward the task of classification. The utilization of AdaptiveAvgPool2d contributes to dimensionality reduction, while the subsequent single fully connected layer facilitates the classification process.

P12 BLOG 3 — YOLO Image Classification or Object Detection

Written by Metehan Sarikaya