Azure AutoML for Images: Automode

Riti Sharma

Follow

Published in

Microsoft Azure

6 min readFeb 6, 2023

--

Train high quality models at-ease with Automatic Hyperparameter Tuning

Introduction

When training complex deep learning models, there is an extensive list of hyperparameters that need to be set and tuned to get an optimal performance. These hyperparameters include learning rate, image size, batch size, number of epochs, etc. The large number of hyperparameters makes it difficult for the users to decide which ones to focus on. Also, some of these parameters are model-specific and do not contribute equally to model’s performance making it more challenging for the user. Consequently, it requires a lot of time, effort, and resources to tune these hyperparameters to get the optimal model.

In this blog post we will cover how AutoML for Images can automatically tune and get an optimal model given a budget specified via max number of trials. For more information about AutoML for Images, you can check out the General Availability announcement, the official documentation and the github notebooks.

Hyperparameter Tuning with AutoML for Images

Hyperparameter tuning consists of finding a set of optimal hyperparameter values from the space of possible hyperparameters. Hyperparameter tuning on AzureML works by running multiple trials as part of a training job for each hyperparameter configuration. Here is an example on how to manually sweep over some of the hyperparameters using AutoML for Images for two different models.

image_object_detection_job = automl.image_object_detection(
    compute="gpu-cluster",
    experiment_name="image-object-detection-sweep",
    training_data=training_data,
    validation_data=validation_data,
)

# Configure search space for hyperparameter tuning
image_object_detection_job.extend_search_space(
    [
        SearchSpace(
            model_name=Choice(["yolov5"]),
            learning_rate=Uniform(0.0001, 0.01),
            model_size=Choice(["small", "medium"]), # model-specific
        ),
        SearchSpace(
            model_name=Choice(["fasterrcnn_resnet50_fpn"]),
            learning_rate=Uniform(0.0001, 0.001),
            optimizer=Choice(["sgd", "adam", "adamw"]),
            min_size=Choice([600, 800]),  # model-specific
        ),
    ]
)

# Configure number of trials
image_object_detection_job.set_limits(
    max_trials=10, 
    max_concurrent_trials=5
)

# Submit job to AzureML Workspace
client.jobs.create_or_update(image_object_detection_job)

Users need to provide a search space, a sampling algorithm, and the limits of the job: how many trials (sampled configurations) are going to be tried.

In this approach, the user has more control and flexibility, but it is also user’s responsibility to define and choose which hyperparameters to tune. This requires a deep understanding of hyperparameters, expertise in the area and insights into which parameters matter the most. Often there is no strong baseline to compare against and searching for an optimal set of hyperparameters ends up being the most time-consuming and costly part of training an optimal model.

Introducing Automode

Automode automatically selects model hyperparameters (batch size, learning rate, number of epochs, etc.) given a budget in the form of maximum number of trials. This feature allows the less experienced users to produce good quality models without performing any manual hyperparameter tuning, and more experienced data scientists to save time/effort on building strong baselines for experimentation.

Here is an example of how you can run Automode using AutoML for Images:

image_object_detection_job = automl.image_object_detection(
    compute="gpu-cluster",
    experiment_name="image-object-detection-automode",
    training_data=training_data,
    validation_data=validation_data,
)

# Configure number of trials
image_object_detection_job.set_limits(
    max_trials=10,
    max_concurrent_trials=5
)

# Submit job to AzureML workspace
client.jobs.create_or_update(image_object_detection_job)

Advantages of Automode

Users can come up with strong baselines with little or no effort.
Users do not need to provide any details about the space to search over. The underlying algorithm will automatically determine the region of the hyperparameter space to sweep over.
Automode adapts to the hardware (AzureML compute instance or cluster) chosen by the user. It will determine batch size, resolution, etc. based on the memory available on the compute so that the training job doesn’t result in out of memory.

Experiments

We performed extensive experiments, but for brevity, we chose to present only the below datasets for classification and object detection. Automode was also compared with two competitor platforms. We chose datasets that have varying characteristics in terms of number of classes, object size, image resolution, etc. All the results reported are on the test dataset.

Object Detection
For this task we selected the following datasets:

VisDrone 2019: Dataset containing images/videos from the drones.
KITTI: Street view images dataset used in autonomous driving.
VOC 2012: Dataset containing objects from realistic scenarios from a variety of object types.
Safety Helmet: Consists of images of personal safety equipment.

Figure 1: Mean Average Precision (%) vs Budget ($) for VisDrone 2019 dataset using Automode

Figure 2: Mean Average Precision (%) vs Budget ($) for Kitti dataset using Automode.

Figure 3: Mean Average Precision (%) vs Budget ($) for VOC 2012 dataset using Automode.

Figure 4: Mean Average Precision (%) vs Budget ($) for Safety Helmet dataset using Automode.

From figures 1–4 above, we can conclude that as we increase the budget, Automode improves the model’s mAP but plateaus later. By increasing the budget from 7$ to 141$, mAP increased by 13% for VisDrone 2019 and by increasing the budget from 5$ to 52$, an increase of 4% was seen for VOC dataset. This can be achieved without any time or effort needed from the data scientists or engineers.

Table 1: Comparing Automode with Competitor 1 given the same budget for Object Detection.

Table 2: Comparing Automode with Competitor 2 given the same budget for Object Detection.

For comparisons with the competitors, a budget was set to achieve the best metric for the competitor.

Automode significantly outperforms competitor 1 given the same budget whereas for competitor 2, Automode is behind ~1% for Safety Helmet and 0.1% for VOC dataset given the same budget but outperforms for VisDrone and Kitti datasets.

Image Classification
For this task we selected the following datasets:

MIT Indoors: Dataset for indoor scene recognition.
Deep Fashion: Large scale clothing dataset for e-commerce.
Deep Weeds: Dataset containing images of weed species.

Figure 5: Accuracy (%) vs Budget ($) for MIT Indoors dataset using Automode.

Figure 6: Accuracy (%) vs Budget ($) for Deep Fashion dataset using Automode.

Figure 7: Accuracy (%) vs Budget ($) for Deep Weeds dataset using Automode.

Like object detection, we see that accuracy of the models increased by ~2% by increasing the budget from 3$ to 62$ for Deep Fashion and 1$ to 36$ for MIT Indoors for image classification task.

Table 3: Comparing Automode with Competitor 1 given the same budget for Image Classification.

Table 4: Comparing Automode with Competitor 2 given the same budget for Image Classification.

When compared with competitors, Automode significantly outperforms competitor 1 given the same budget whereas for competitor 2, Automode is behind 0.2% for Deep Weeds but outperforms for MIT Indoors and Deep Fashion datasets.

Conclusion

We found that Automode in AutoML for Images is a feature that greatly simplifies the life of data scientists and ML engineers by offering high quality models and eliminating the need for manual hyperparameter tuning. Please give it a try and let us know what you think!

Sample Notebooks

Image Object Detection
Image Multiclass Classification
Image Multilabel Classification
Image Instance Segmentation

CLI Examples