YOLO V8: A Breakdown of Its Improved Features and Functionality

4 min readOct 10, 2023

The most prevalent job in computer vision is object detection. It focuses on locating a region of interest inside an image and classifying this region in the same way that a standard image classifier would. Multiple areas of interest pointing to various things can be present in a single photograph. This elevates object detection to a more complex image classification issue.

A well-liked object detection model called YOLO (You Only Look Once) is renowned for its quickness and precision. The most recent version is YOLO v8, and it has undergone multiple iterations since being first released by Joseph Redmon et al. in 2016.

In this article, we’ll talk about what distinguishes YOLO v7 from other object identification algorithms and how it stacks up against them.

R-CNN Model: Bridging the Gap Between Region Proposal and CNNs for Object Detection

The R-CNN (Regions with CNN features) model, created in 2014 by Ross Girshick and his colleagues at Microsoft Research, was one of the first successful attempts to tackle the object detection problem using deep learning. To find and localise objects in images, this model combined region proposal methods with convolutional neural networks (CNNs).

Based on how many times the same input image is sent through a network, object detection methods can be roughly divided into two types.

The Architecture and Algorithm of YOLO

Using an end-to-end neural network to predict bounding boxes and class probabilities simultaneously is what You Only Look Once (YOLO) suggests doing. It is distinct from the strategy used by earlier object detection algorithms, which used classifiers as detectors. With a completely distinct approach to object detection, YOLO outperformed existing real-time object detection algorithms and produced cutting-edge findings.

While Faster RCNN and similar algorithms use the Region Proposal Network to identify potential regions of interest before performing recognition on each of those regions independently, YOLO conducts all of its predictions using a single fully connected layer. Methods that use Region Proposal Networks perform multiple iterations for the same image, while YOLO gets away with a single iteration.

YOLO Evolution

Every iteration of the YOLO (You Only Look Once) object detection system brings improvements to increase accuracy and speed. Anchor boxes and the Darknet-19 architecture were included to YOLO v2 (YOLO9000), which improved detection for a variety of item classes and sizes. For improved performance and stability, batch normalisation and multi-scale training were also used. For better handling of object size, YOLO v3 included Darknet-53 architecture and diverse anchor boxes. For multi-scale object detection, feature pyramid networks (FPNs) were introduced. The CSPNet architecture and “GHM loss” for handling imbalanced datasets were introduced in YOLO v4. It used k-means clustering to refine anchor boxes. A bigger dataset, dynamic anchor boxes, and EfficientDet architecture were all adopted in YOLO v5. Small object detection was further improved by spatial pyramid pooling (SPP), while performance on unbalanced datasets was enhanced by “CIoU loss”. For improved computational performance, YOLO v6 made use of the EfficientNet-L2 architecture. Nine anchor boxes were added to YOLO v7 to expand the variety of object size and shape. These variations indicate substantial developments in object detection using deep learning methods.

Limitation in YOLO v7

A strong object detection method called YOLO v7 has the following drawbacks:

1. YOLO v7 has trouble detecting small things and may have trouble in busy environments or when items are far away from the camera.

2. Scale Sensitivity: It may not function effectively in situations when there is a wide range in object sizes, making it difficult to identify objects that are extremely huge or extremely small in comparison to other objects in the scene.

3. Sensitivity to Environmental Conditions: YOLO v7 may be sensitive to variations in illumination and other environmental factors, which makes it less trustworthy in situations with a range of conditions.

4. Computational Intensity: YOLO v7’s real-time performance on resource-limited devices like smartphones and edge devices may be constrained by how computationally intensive it is.

YOLO v8

YOLO v8 is the latest version of YOLO by Ultralytics. As a cutting-edge, state-of-the-art (SOTA) model, YOLOv8 builds on the success of previous versions, introducing new features and improvements for enhanced performance, flexibility, and efficiency. YOLOv8 supports a full range of vision AI tasks, including detection, segmentation, pose estimation, tracking, and classification. This versatility allows users to leverage YOLOv8’s capabilities across diverse applications and domains.

Object Detection

Object detection is a task that involves identifying the location and class of objects in an image or video stream.

The output of an object detector is a set of bounding boxes that enclose the objects in the image, along with class labels and confidence scores for each box. Object detection is a good choice when you need to identify objects of interest in a scene, but don’t need to know exactly where the object is or its exact shape.

YOLO v8 pretrained Detect models are shown here. Detect, Segment and Pose models are pretrained on the COCO dataset, while Classify models are pretrained on the ImageNet dataset. Models download automatically from the latest Ultralytics release on first use.

YOLO v8 Models and their Characteristics

mAPval values are for single-model single-scale on COCO val 2017 dataset.
Reproduce by yolo val detect data=coco.yaml device=0
Speed averaged over COCO val images using an Amazon EC2 P4d instance.
Reproduce by yolo val detect data=coco128.yaml batch=1 device=0|cpu

YOLOv8n- by Ultralytics Deployment Results.

YOLO V8: A Breakdown of Its Improved Features and Functionality

Written by Muhammad Ahtisham