Object Detection: Train a CustomYOLOX Model on Azure Machine Learning

Peng Hao
Slalom Technology
Published in
7 min readMar 13, 2023
Photo by MART PRODUCTION from Pexels

Object detection is a computer vision technique, which aims to identify and classify specific objects and their locations in an image or video clip. Many algorithms have been developed for real-world applications — since most of them are implemented using deep learning frameworks that require large datasets and high computing resources for training, moving the training process to the cloud allows models to scale efficiently and at a lower cost.

Slalom recently partnered with a well-known Japanese manufacturing corporation to develop an intelligent product to address rail maintenance challenges. As a proof of concept, we designed an automated training pipeline to train a custom object detection model using the YOLOX algorithm on the Azure cloud. YOLOX (You Only Look Once, with eXtra head) is a state-of-the-art object detection algorithm that achieves higher accuracy than the original YOLO algorithm while maintaining real-time performance. The trained model is then deployed to the edge device to perform specific detection tasks in a production environment. The MVP model shows promising results and has helped us quickly gain client trust to participate in future product improvements.

Most of the learnings shared in this blog post come from our work with the client. In this blog post, you will learn some best engineering practices on:

  • How to prepare a custom image dataset for object detection in Azure Machine Learning (AzureML)
  • How to train a YOLOX model on a custom dataset in AzureML
Demo photos shared by YOLOX repo

Data preparation and annotation

Once the training images have been captured and uploaded to an Azure storage account (we used Azure Data Lake Storage gen2 as the storage solution for all the raw data), the next step is to split the dataset and annotate the images to generate labels for training.

There are many annotation tools/software available for this task — to keep data security and ecosystem consistency, we chose to use the Azure Data Labeling tool provided in AzureML Workspace for data annotation. Instructions for setting up an image labeling project and creating annotations for the object detection task can be found here, so we will not cover too many details.

When the image annotation is ready to be exported, there are two options for download: COCO file and Azure ML Dataset. A COCO file stores annotations in a JSON file format, which is used to identify objects within the images and provide information about their attributes. As YOLOX requires MS COCO format for training, we selected to download the annotation via COCO. Below is one example of annotation data.

"annotations": [{"image_id": 12,"bbox": [0.598048, 0.3515, 0.0418216, 0.033],"area": 0.1345,"category_id": 1,"id": 1}]

Each bounding-box (bbox) annotation is split up into 4 values:

1. x — The x-axis location of the upper left corner of the bounding box

2. y — The y-axis location of the upper left corner of the bounding box

3. w — The width of the bounding box

4. h — The height of the bounding box

These values are normalized to the image size and they lie within the range of 0 to 1. We cannot directly use this annotation data to train a YOLOX model because it does not follow COCO dataset standard format and it scales all bounding boxes, making it difficult for the model to learn a small object. Therefore, we create a post-processing script to convert the exported annotation data for compliance.

The next step is to upload the new annotation files to the same storage location as the raw images. The data should be structured as follows before being used for training:

demo_data/
|
|-----annotations/ # annotation json files
| |
| |----instances_train.json
| |----instances_val.json
|
|-----train/ # training images
|-----val/ # validation images

Azure Machine Learning training workflow

The following diagram provides a view for training a YOLOX model in AzureML.

Azure Machine Learning workflow for training step

After structuring our training data, we will show how to perform the following steps:

1. Build a custom environment in a containerized way and register the environment in AzureML workspace

2. Create and register a data asset in AzureML workspace based on data stored in Azure Data Lake Storage Generation 2 (ADLS gen2)

3. Submit a training job run via either the Python SDK or the Azure CLI ml extension v2 (Azure CLI ml extension is an extension for the Azure Command-Line Interface that provides a set of commands for working with Azure Machine Learning resources).

Step 1: Build custom environment in AzureML

There are many ways (eg. using pip, conda or Python SDK) to create a custom environment in AzureML. In our case, we built a custom docker image which will give us more customization.

To be able to install all the required dependencies to train the YOLOX model in a container, you will need to create your environment build script and your custom environment file.

environment yaml file
environment build script

The ‘environment build’ sample code is ran on an AzureML notebook to create and register a custom environment. Once the script finishes running successfully, a docker image that encapsulates all custom Python libraries will be uploaded to the workspace Azure Container Registry (ACR), so it can be pulled down anytime to quickly reproduce the training environment.

The registered environment is easily maintained through the AzureML workspace. You can create a new instance of an environment by editing an existing one or view changes to your environments over time as new versions are created.

Step2: Create Data Asset in AzureML

In this step, AzureML Datastore service is used to link an existing data storage account (eg. ADLS gen2 in our case) for use in AzureML. In order to access input data in a training job, you need to create a data asset to point to the data source location without incurring extra storage cost and specify the data to be read-only mounted to the remote compute target/cluster.

The following script is used to register the previously uploaded images and annotation files as a data asset for the training job.

Step3: Train YOLOX in AzureML

Below is the structure of the training code.

project_demo/
|
|-----pretrained_weights/
| |
| |---yolox_s.pth
|
|-----exp_configs/
| |
| |---yolox_s_custom.py
|
|-----notebooks/
| |
| |---command_job.ipynb
|
|-----pipelines/ # Pipeline YAML files
| |
| |---pipeline_job.yml
|
|
|-----YOLOX/ # YOLOX code

With the code organised in this way, you can submit a command job to kick off the training process in an AzureML notebook.

Notebook to submit a command job to train YOLOX model in AzureML

You can also use the Azure CLI ml extension v2 to submit a pipeline job from local to run on AzureML.

The pipeline job is submitted by running the following command locally (Tips: Azure CLI should be installed in your local environment):

az ml job create -f pipelines/pipeline_job.yml

Once the job is compiled and submitted successfully, a JSON dictionary with all the information about the pipeline job will be returned. You can open the services.Studio.endpoint URL and see a nice graph visualization of your pipeline (example below).

You can click on the training block to monitor the job running status, check outputs and logs.

Conclusion

In this blog post, we outlined the various steps that need to be addressed, from custom data annotation to model training with AzureML for a successful YOLOX model to be used in a production environment. There are many topics I am not covering in this blog post. For example, depending on your use-case, you may consider experimenting with different versions of YOLOX, either a light-weight or larger model. You will likely need to fine tune your model based on your custom dataset. Once the model is ready, you might also need to consider deploying it in your edge device for scalable inference. We will be covering these topics and more in this blog series.

Slalom is a global consulting firm that helps people and organizations dream bigger, move faster, and build better tomorrows for all. Learn more and reach out today.

Thanks to Lam Truong, Shiva Lyer and Ayan Guha.

--

--