YOLOX : Object detection model exceeding YOLOv5

David Cochard
axinc-ai
Published in
4 min readNov 1, 2021

This is an introduction to「YOLOX」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

YOLOX is a state-of-the-art object detection model released in August 2021, which combines performance beyond YOLOv5 with a permissive Apache license.

Source: https://github.com/Megvii-BaseDetection/YOLOX/blob/main/assets/logo.png
Source: https://github.com/Megvii-BaseDetection/YOLOX/blob/main/assets/demo.png

Architecture

YOLOX is an object detection model that is an anchor-free version of the conventional YOLO and introduces decoupled head and SimOTA. This model was awarded first place of the Streaming Perception Challenge at CVPR2021 Automatic Driving Workshop.

Since the existing YOLOv4 and YOLOv5 pipelines are over-optimized for the use of anchors, YOLOX has been improved with YOLOv3-SPP as a baseline. YOLOv3-SPP was updated to use the advanced YOLOv5 architecture that adopts an advanced CSPNet backbone and an additional PAN head.

In object detection models, the tasks of classification and regression (calculation of bounding box positions) are performed simultaneously, which is known to cause conflicts and reduce accuracy. To solve this problem, the concept of decoupled head was introduced. The conventional YOLO series backbone and feature pyramids still use a classic coupled head, but YOLOX has been updated to use a decoupled head and achieve higher accuracy.

Source: https://arxiv.org/pdf/2107.08430.pdf

YOLOX was trained on a dataset that was strongly augmented using Mosaic and Mixup strategies. The authors also use the advanced label assignment SimOTA, a modified version of OTA, to optimize loss.

The contribution of each newly introduced tool is as follows.

Source: https://arxiv.org/pdf/2107.08430.pdf

The benchmark results of YOLOX are shown below.

Source: https://arxiv.org/pdf/2107.08430.pdf

YOLOX model variants

There are variations of YOLOX split in two categories, Standard Models for high precision and Light Models for edge devices.

Source: https://github.com/Megvii-BaseDetection/YOLOX

YOLOX performance

Inference time and mAP50 was measured on validation set of COCO2017. YOLOX-s is able to achieve the same accuracy as YOLOv4 with half processing time.

mAP50 of YOLOX
Inference time of YOLOX

The following repository and ailia SDK 1.2.8 were used to measure mAP and inference time.

CVPR2021 Automous Driving Workshop Streaming Perception Challenge

The link below is the leaderboard of the Streaming Perception Challenge at CVPR2021 Automatic Driving Workshop, in which YOLOX won the first place under the name BaseDet.

For this challenge, Argoverse 1.1 dataset was used, which is the Argoverse HD dataset for automated driving with the addition of 2D bounding box annotations similar to the COCO dataset. The Argoverse 1.1 dataset contains 1,250,000 bounding boxes annotated using car frontal camera videos.

Source: https://www.cs.cmu.edu/~mengtial/proj/streaming/

Usage

YOLOX can be used with ailia SDK with the following command to detect object in the webcam video stream.

$ python3 yolox.py -v 0

By default, YOLOX-s is used. Other models, including tiny models, can be used by using -m option.

ax Inc. has developed ailia SDK, which enables cross-platform, GPU-based rapid inference.

ax Inc. provides a wide range of services from consulting and model creation, to the development of AI-based applications and SDKs. Feel free to contact us for any inquiry.

--

--