Azure Machine Learning

Azure AutoML for Images: Generating Explanations

Model Explainability for multi-class and multi-label image classification.

Ram v

Published in

Microsoft Azure

8 min readJun 26, 2023

Generate explanations for the predictions of vision models

Co-authored by Mercy Ranjit

Introduction

Explaining predictions from a model is desirable as it brings an element of trustworthiness to the predictive system and helps cohere human reasoning with that of the predictive system. It also helps uncover potential flaws of the predictive system, a classic example being a model classifying a husky as a wolf based on its background.

In this blog post, we will uncover the explainability feature of the Azure AutoML for Images capability in Azure Machine Learning. AutoML for Images supports various state-of-the-art algorithms for different task types and model explainability is introduced currently for the multi-class classification and multi-label classification task types. Users can generate explanations on training and test datasets to debug the models and also deploy the models with explainability to improve transparency at inference time.

Feature Attributions also known as Pixel Attributions or Saliency Maps is a very popular explainability method for images as it intuitively points out which pixels of the image contributed the most to the predictions. How the saliency maps are derived for the predictions varies with the algorithms supported. AutoML for Images supports different methods of explainability allowing users to compare the explanations from the different methods.

A brief summary of saliency approaches:

1. Gradient based backpropagation methods:

These methods work by computing the gradient of the prediction w.r.t to the input features. We compute the importance score of every input w.r.t output using a backpropagation based gradient computation where we assign the importance of output w.r.t to the previous layer and continue backpropagating until we reach the input. The larger the absolute value of the gradient of the pixel, the stronger the influence of the pixel.

2. Occlusion based methods:

These methods manipulate parts of an image to compute attributions. These methods are model agnostic. Model agnostic methods do not require the model for providing explanations.

3. Path Attribution methods:

Path-attribution methods compare the current image to a reference image, called the baseline image. The difference between classification scores of the actual image and the baseline image are attributed to the pixels. The choice of the reference image has a big effect on the explanations.

Let’s also review what global vs local explanations means in terms of the gradient based feature attribution methods:

Local and Global Explanations

Global feature attribution methods directly provide the change in the function(model) value given changes in the features, whereas local feature attribution methods focus on the sensitivity of the function to the changes to the features, so that the local feature attributions need to be multiplied with the input to obtain an estimate of the change in the function value. Thus, for gradient-based explanations the raw explanation such as the gradient itself is a local explanation, while the raw explanation multiplied with the raw input is called a global explanation.

Here is a quick table view of the methods supported by Azure AutoML for Images. Following table describes the different explainability methods with their pros and cons and whether they are model agnostic, architecture agnostic, local or global interpretation techniques.

Overview of available XAI methods

Let’s now see the different ways to use the above methods in Azure AutoML for Images to generate explanations.

Generating Explanations using Azure AutoML for Images

There are three ways to generate explanations using AutoML for Images.

Generating explanations using the online endpoint
Generating explanations using the batch endpoint
Generating explanations using the Responsible AI dashboard

We will go through all the above possible ways to generate explanations. Users don’t need a separate deployment for generating explanations. Existing models trained on AutoML for Images can be deployed to endpoints and utilized to generate explanations out of the box.

Below methods are supported for generating explanations:

XRAI (xrai)
Integrated Gradients (integrated_gradients)
Guided GradCAM (guided_gradcam)
Guided BackPropagation (guided_backprop)

In this blog, we will demonstrate the above methods with the models trained on flowers dataset and brain tumor dataset and compare the performance of the methods.

This blog focuses on the steps that we need to follow to generate explanations with AutoML for Images. Please go through Set up AutoML for computer vision — Azure Machine Learning | Microsoft Learn for all the steps from setting up the environment to generating explanations.

Prerequisites

Setup the Azure Machine Learning workspace, Install and Setup CLI and/or python SDK v2 (model training can be done through either of the options but inferencing is supported only though SDK V2 APIs). Download the sample notebooks from the Azureml-examples repository.

For multi-class classification, multi-label classification use the linked notebooks. Using the notebooks user can connect to Azure machine learning workspace, perform the data preparation in MLTable format, Configure and run the AutoML for images classification training job through Automode or through manual hyperparameter sweeping. Retrieve the best model and deploy it to the managed online endpoint.

Generating explanations using the Online Endpoint

Once the model endpoint is available, we can get the end point details and perform online inference by passing an image in base64 format and some explainability related hyperparameters.

Get the endpoint details

Get endpoint details

2. Prepare the input data for inferencing. Create a json file in the following format:
- input image in base64 format with key image_base64.
- model_explainability set to True (This enables explainability. When set to false, it ignores all the parameters and returns only the predictions and not explanations).
- All the explainability related parameters can be provided in xai_parameters key. It includes explainability algorithm name and any other hyperparameters for the provided XAI algorithm, and a parameter that indicates whether the user wants visualization or attribution scores or both.

For more details on the parameters for the different methods, refer to the generating explanations and input schema sections in the docs.

Define input schema for explainability

3. We can request for explanations on the endpoint using the input json created in the above step

Invoke endpoint for inferencing explanations

4. Output will have probs, labels, visualizations, attributions keys in this format. We can use the following snippet for visualizing the explanations.

For more details on interpreting the explanations refer to this section of the article.

5. For Visualizing attribution scores, refer to the interpreting the attributions section of the article. Attribution scores are generated only if the model_explainability and attributions keys are set to True.

Comparison of results across datasets and Explainability algorithms

As described previously, we have used 2 datasets (flowers and brain tumor datasets) to train the SeresNext and ViT models using AutoML for Images and deployed the best performing model for each dataset and generated explanations for each explainability algorithm.

Below are the explainability results for ViT and Seresnext for brain tumor, flowers datasets respectively. Please note that guided GradCAM method works for CNN based architectures only, hence the result for this method is available only for Seresnext.

Sample Explanations of Seresnext predictions with Flowers Dataset

Explanations of Guided Backprop, Guided Gradcam

Explanations of Integrated Gradients, XRAI

Sample Explanations of ViT with Brain Tumor Dataset

Explanations of Guided Backprop, Integrated Gradients, XRAI

Inference time comparison for ViT and Seresnext model architectures on Flowers Dataset

The following average inference times were recorded for predictions from a managed online endpoint deployed with Standard_NC6s_v3 instance type (instance count 1) with 25 input images. These scores may vary depending on the network bandwidth and system configurations.

Inference time in seconds

*Guided Gradcam is not supported for ViT.

Based on the above visualizations, we can conclude that XRAI returns better interpretable explanations but consumes more time, whereas guided backpropagation does reasonably well on both performance and inference speed.

Generating explanations using the batch endpoint

If you want to generate the explanations for all input samples, you can use batch scoring notebook. Update explainability related parameters in the section 6.4.3 in the above notebook.

Generating explanations using the Responsible AI dashboard

For generating explanations through RAI Dashboard, the user needs to submit the RAI vision insights pipeline either through SDK, CLI or UI (via AzureML studio). For more details on pipeline submission and parameter details refer to this article. After the pipeline is completed, click on the RAI dashboard link available in the RAI vision insights component. Then connect to the compute instance for generating the explanations. Once the compute instance is connected, you can click on the input image, and it will show the explanations for the selected algorithm in the sidebar from the right.