Development of YOLOX model as RESTful API using Flask and pyACL API

Alper Balmumcu
Huawei Developers
Published in
8 min readDec 28, 2023

Introduction

Hi all! In this article, we are going to talk about building REST-API for the YOLOX model using pyACL API developed by Huawei.

Let's start by finding out the secrets of YOLOX.

YOLOX: Introduction and Architecture

To understand the significance of YOLOX, it is crucial to appreciate the evolution of object detection algorithms. Traditional algorithms, such as the sliding window approach and region proposal-based methods, were effective but computationally expensive.

The You Only Look Once (YOLO) algorithm introduced a groundbreaking approach to object detection by framing it as a regression problem. YOLO divided the input image into a grid and predicted bounding boxes and class probabilities directly. This approach achieved impressive speed but struggled to detect small objects accurately due to its single-scale feature extraction process. The subsequent versions of YOLO attempted to address this limitation but often compromised on speed or accuracy.

Released in July 2021, YOLOX is an anchor-free object detection algorithm that introduces advanced detection techniques like Decoupled Head and simOTA label assignment strategy. Moreover, strong data augmentation like MOSAIC and mixUP are incorporated for robust training. YOLOX began with the YOLOv3 SPP model as a baseline and performed these modifications one after another.

Compared to other versions of YOLO, YOLOX has several key differences:

  • Simpler structure: YOLOX has a simpler structure than YOLOv3, making it more efficient and faster.
  • Better accuracy: YOLOX achieves a better trade-off between speed and accuracy than other counterparts across all model sizes.
  • CSPDarknet backbone: The backbone of YOLOX is CSPDarknet, which consists of the residual block, CSP block, and SiLU. The residual network effectively alleviates the gradient disappearance problem in deep neural networks. CSPBlock dramatically improves the computing and learning ability of CNN and reduces the amount of calculation. The activation function, SiLU, is an enhanced version of Sigmoid and ReLU.
  • Multi-attention mechanism: YOLOX introduces multiple attention mechanisms to enhance detection performance.
  • Anchor-free approach: YOLOX uses an anchor-free approach, which is different from previous YOLO networks
Illustration of the difference between the YOLOv3 coupled head and the YOLOX decoupled head. For each level of the FPN feature, we first adopt a 1 × 1 conv layer to reduce the feature channel to 256 and then add two parallel branches with two 3 × 3 conv layers each for classification and regression tasks respectively. (source)

YOLOX strikes a remarkable balance between speed and accuracy. It incorporates an efficient backbone network, a feature fusion technique, and a modified Darknet head for prediction. YOLOX’s key features include its ability to capture rich semantic features, effectively fuse features at different scales, and accurately predict bounding boxes, objectness scores, and class probabilities. The advantages of YOLOX are its real-time capabilities, versatility in detecting objects of different sizes, and lightweight implementation. However, YOLOX may have lower accuracy with small objects, requires intensive training, and can face challenges with occlusions. Overall, YOLOX is a groundbreaking framework for real-time object detection with impressive speed and accuracy.

RESTful API

RESTful APIs have become a common choice when building backends for web and enterprise applications, particularly in microservice architectures. Companies like Netflix, Uber, Airbnb, eBay, Amazon, Twitter, Nike, and many others utilize RESTful APIs for their backends. Additionally, there are numerous publicly available RESTful APIs on the internet, offering a wide range of functionalities. However, testing the correctness of these APIs can be challenging, requiring the creation of network messages, data setup in databases, and possible mocking of interactions with external services.

REST API Architecture (source)

REST is an architecture used to design services that can be consumed across different platforms and environments, promoting interoperability and support for the World Wide Web (www). It has become a standardized approach for publishing services on the internet. RESTful APIs play a crucial role in the microservice design and are accessed through endpoints, each representing a specific functionality of a business process. These APIs are typically accessible over HTTP, using standard verbs such as GET, POST, PUT, and DELETE. JSON has emerged as a widely adopted format for messaging in RESTful APIs, providing a machine-readable and network-friendly structure. However, defining a standardized way to describe REST services remains a challenge, although the OpenAPI specification has emerged as a potential solution. Testing RESTful APIs can be complex due to loose coupling, adaptability, and the involvement of third-party libraries, making it crucial to detect and resolve bugs to ensure service stability, especially for mission-critical activities. Various research efforts have aimed to address these testing challenges through frameworks and unit test generation approaches.

pyACL API

Now, let's find out the pyACL API and the secret behind it.

ACL is basically Ascend Compute Language and offers C++ interfaces to manage devices, contexts, streams, and memory. It supports loading and executing models or operators and processing media data. Users can develop deep neural network applications for various tasks using ACL.

ACL Software Stack

Basically, pyACL is a Python API library based on ACL. Python users can use the Python Ascend Computing Language (PyACL) and it allows for managing the running and resources of Ascend AI processors using Python.

For further information:

Setting up the project

We should create a new directory for the project. Inside the project directory;

  • The main.py file will store our Flask application.
  • The model.py file will store the pyACL YOLOX model inference.
  • The model directory will store the converted YOLOX OM model using ATC.
  • The src directory will store the YOLOX preprocessing and postprocessing parts.
  • The static directory will store the input image. After postprocessing will be successful, the detected output image also will be saved here.

Here is the directory tree:

|-- main.py
|-- model.py
|-- model
| `-- yolox_s.om
|-- src
| |-- main.py
| |-- postprocessing.py
| `-- utils.py
|-- static
| |-- person.jpg
| |-- detection.jpg

Converting Pytorch model to Offline Model (OM)

Let’s start with converting the model to desired OM format.

Step 1: Pytorch (.pth) to ONNX

The first step is converting the original model to ONNX using the built-in function in the YOLOX repository. After cloning the repository, we are going to change to desired commit version. Using the export_onnx.py file located in the tool directory, conversion to the ONNX model will be performed.

# Clone the repository
git clone https://github.com/Megvii-BaseDetection/YOLOX.git
cd YOLOX

# Revert to commit version
git reset c9d128384cf0758723804c23ab7e042dbf3c967f --hard

# Export to ONNX
python tools/export_onnx.py --output-name yolox_s.onnx -f exps/default/yolox_s.py -c yolox_s.pth --opset 10

Step 2: ONNX to Offline Model (OM)

For running the model on the Ascend NPUs, OM conversion is necessary for PyTorch models. As shown in the below command, model conversion using ATC is performed with defining flags that are necessary.

# Converting ONNX to OM
atc --model=yolox_s.onnx \
--framework=5 \
--output=yolox_s \
--input_format=NCHW \
--precision_mode=force_fp16 \
--input_shape='images:1,3,640,640' \
--log=info \
--soc_version=Ascend310

For getting more detailed information about ATC, you can take a look at this documentation.

Creating pyACL YOLOX model inference application

Let's continue with creating the model inference example.

In the model script, the inference function initializes the ACLLite library, loads the model, reads an image, performs pre-processing on the image, executes the model, and performs post-processing on the output. In the end, the function returns the processed image in the required channel format.

Building the Flask application

After creating a model inference script, the Flask application can be created as follows;

This script initializes a Flask application and sets the path to the output folder where images with detections are saved. It defines a function named get_image that is called when a POST request is sent to the /image endpoint. The function reads the image file from the request, saves it to the current working directory, processes the image using the inference function, saves the processed image to the output folder, and returns the processed image as a response. This code is useful for building web applications that work with images, as it provides a simple way to send and receive images using Flask. By following the code structure, you can easily customize the function to suit your specific needs and build more complex applications that involve image processing and analysis.

Run & Test — Postman

Let's look at the brief information about Postman.

Postman is a widely-used tool for API testing. It allows developers, QA engineers, and DevOps teams to send HTTP requests and view responses. With its user-friendly interface and comprehensive features, Postman has become a must-have tool in the software development industry. In 2022, Postman introduced new features that enhance its functionality. These features include automated testing, GraphQL API testing, and integration with popular CI/CD tools like Jenkins, GitLab, and Travis CI. These additions make Postman even more valuable for developers and testers, providing a comprehensive solution for API testing and management. Postman simplifies the process of testing APIs by providing a simple and efficient way to send requests and view responses. It helps improve the quality and reliability of APIs by allowing developers to easily test and validate their functionality. The tool’s user-friendly interface makes it easy to get started with API testing.

Postman output

After running the application using the python3 main.py command, the Postman tool is used to post a request and get the image response.

Conclusion

In this article, we have learned about YOLOX architecture, REST-API working principle, pyACL, and ATC for model conversion on the hardware. Also briefly explained Postman and how we test the API.

--

--