Azure AutoML: Quickly build high quality ML models

Published in

Data Science at Microsoft

8 min readAug 1, 2023

What is Azure AutoML?

Azure AutoML allows data scientists to execute remote experiments that automatically evaluate many different combinations of Machine Learning algorithms, feature preprocessing techniques, and hyperparameters for a given dataset. After automatically training many different models, you can select the best performing one. AutoML also provides helpful views for exploring the results, such as detailed metrics and model explanations.

Time savings and other advantages of Azure AutoML

Like any new tool, AutoML can take some time to learn when getting started, so hopefully this article will help reduce that time. After successfully running a first AutoML experiment, it is quick to create more in the future. With the time saved from avoiding manually testing different Machine Learning algorithms, you can spend more time crafting the best features and understanding the model predictions.

AutoML also saves time on tracking experimental results, making it straightforward to revisit past experiments to review the data, models, and performance. When iterating in notebooks it can be easy to overwrite or lose track of experimental results. The experiments feature in Azure Machine Learning Studio is generally a good way to track results regardless of whether an automated or manual Machine Learning approach is being used.

Another example of where AutoML can save time is when testing out how much data can improve model performance. Suppose you’d like to see how well a new feature or set of features will improve model performance. You can run an AutoML experiment with a base set of features and then easily rerun the experiment with the new set of features. Instead of training directly in the notebooks you can start remote training jobs, which means that you don’t have to wait a long time for one experiment to complete before starting the next. Once the AutoML experiments have finished, it’s easy to compare the evaluation metrics for the best performing model and determine the effectiveness of the newly added features.

Prerequisites

The official Azure documentation for setting up AutoML training provides details on how to get started and lists the prerequisites. These prerequisites include an Azure Machine Learning workspace and a compute instance with the Azure Machine Learning Python SDK installed. Documentation on how to create a workspace and computer instance can be found here if you don’t already have one. Most compute instances come preinstalled with Azure AutoML, but in cases where they’re not, it can be installed using the following pip command:

pip install azureml-train-automl

After satisfying the prerequisites, you can import the AutoMLConfig class within an AML notebook attached to a compute instance. Import AutoMLConfig using the following command:

from azureml.train.automl import AutoMLConfig

Getting started with an example

The Microsoft Azure team has supplied a helpful repo of AutoML examples. Let’s run through the first example notebook to configure and run our AutoML job. This example uses AutoML for classification to predict whether a credit card transaction should be considered fraudulent. The notebook in AML appears as follows:

Scroll through and run the example notebook to observe the purpose of each section. Most AutoML experiments follow a similar set of steps as seen in the example. The steps typically involve importing the necessary packages, getting a reference to the workspace, selecting a compute resource on which to run the experiment, loading the data, starting the experiment, and then analyzing the results.

The main step when setting up an AutoML experiment is defining the AutoMLConfig object. Here we set various parameters to configure different aspects of the experiment. This includes setting the problem type (“classification”, “regression”, or “time series”) for evaluating the model, and when to stop the experiment. The official documentation on AutoMLConfig lists all the available parameters and descriptions. This sample uses the following configuration:

With this configuration, we select the best model based on the “average_precision_score_weighted” metric from cross validation (three splits) on the training data. It is also optional to pass in a validation dataset instead using cross validation on the training data. Setting the parameter “enable_early_stopping” to “true” ends the experiment early if the metric score is not showing improvement. To get better performing models on real projects outside of this sample, we could increase the values for “max_concurrent_iterations” and “experiment_timeout_hours” to allow for longer training time on more compute.

Once we have started the experiment, we are able to access the results using the link to the Details Page.

The link takes us to the following view, as shown below, where we can learn more about the experiment results.

According to the Duration field, we see that the experiment took a total of about 35 minutes, and from the Status field that it hit the 15-minute early timeout (set in the AutoMLConfig using the “experiment_timeout_hours” parameter). From the Best Model Summary field we see that a VotingEnsemble model achieved the highest average precision score, with a weighted metric of around 0.986. Next, we navigate to the Data Guardrails tab to see whether any checks were flagged for the dataset.

These checks let us know important details about our data. In this case the Class Balancing Detection guardrail alerts us that our inputs are imbalanced. After tapping on the View Additional Details button, we are informed that there are a total of 227,861 samples in our dataset and there is a class with only 396. This is presumably the positive label (fraudulent transaction) that may make sense for this dataset. This is not necessarily an issue, but it is good to be aware of.

Next, we navigate to the Models tab to get a list of the different models tested ranked by performance.

At the top of the list, we see VotingEnsemble listed as the winner. The ensemble algorithms typically rank at the top of the list because they are a combination of multiple algorithms. This also means that they are more complex and can take longer to run. As an FYI, there are options to enable or disable these ensemble algorithms using “enable_voting_ensemble” and “enable_stack_ensemble” when creating the AutoMLConfig object.

To get more information about the best performing model, click on VotingEnsemble from the Algorithm column. That takes us to the following page:

Here we see a model summary with performance on the primary metric (average precision score weighted). Clicking on View Ensemble Details gives us information on the different algorithms used in the ensemble, ensemble weights, and hyperparameters. The View All Other Metrics option shows a long list of different metrics calculated for this model. These metrics are also accessible from the Metrics tab, which we go to next.

Here we have many different metrics automatically calculated for the model. You may notice macro, micro, and weighted metrics calculated that are typically used for multiclass classification even though this is a binary classification problem. This is because AutoML calculates the multiclass metrics by default. To get binary metrics such as AUC_binary, we would need to specify a value for the “positive_label” parameter when creating the AutoMLConfig object. In additional to the numerical metrics displayed, we also get access to various plots such as the precision-recall curve and ROC curve. We can click on the different items in the legend of each plot to hide or show different lines in the plots. Additional plots calculated that are not shown in the screenshot include a calibration curve, lift curve, cumulative gains curve, and confusion matrix.

Although we can see a lot of information about the model from these different views, sometimes it can be helpful to look directly at the code. To get to the code we navigate to the “Outputs + logs” view.

The main file containing details for training and evaluation can be found at the path outputs > generated_code > script.py. In the same generated_code folder is a script_run_notebook.ipynb file that will run the script.py file from a notebook, and the conda_environment.yaml file listing the package dependencies of script.py.

Those are just some of the views within the AutoML experiment. If you are interested in learning more about what’s available in the AutoML experiment views, check out the official Azure documentation. A good place to start is the page on evaluating automated Machine Learning experiment results.

Viewing the results from the experiment page in Azure ML can be helpful, although sometimes we may want to view additional metrics and plots that are not available in that interface. The best way to calculate these custom metrics and plots is to query the model from the experiment within a notebook. At the end of the credit card fraud example notebook, we can see an example of this.

Here the “fitted_model” variable is the best performing model from the experiment. We can call the “predict” and “predict_proba” functions on this “fitted_model” variable to generate predictions. We can then write our own code for any custom evaluation on those predictions.

Sometimes the “remote_run” variable will no longer exist because the notebook has restarted. Instead of rerunning the AutoML experiment, we can get a reference to a previous experiment using the following code:

To reference the correct experiment, we simply need to make sure that the experiment name and run ID match. The run ID can be found on the experiment page under the Overview tab as the Name field.

Conclusion

I hope this article is helpful for understanding some of the capabilities of Azure AutoML and inspires you to apply it to your own projects. Please feel free to leave a comment below if you have any questions or suggestions.

Dave Kooistra is a passionate programmer with experience in Machine Learning and software engineering. Find him on LinkedIn.