Seeing is explaining. How Example-Based Explanations help improving a car damage classifier

Published in

Google Cloud - Community

10 min readAug 27, 2023

Some months ago, I was in an interesting conversation about Explainable AI.

People were discussing how feature attribution explanations such as Integrated Gradients or XRAI were helping them to understand behavior and improve performance of an image classification model.

With Feature attribution explanations, they were able to find the features (pixels) of the image that contributed most to predict a particular class and explain how the model was incorrectly classifying some images when those features were visible in the images.

But how about approaching the challenge of explaining models from a different perspective?

Just as we look at precision and recall to understand model performance, wouldn’t it be better to have a complementary new approach that allows us to understand how training data affects the model along its learning process?

This article introduces example-based explanations and provides some intuition about how they can help explain model results and improve models. It also shows you how to use Example-based Explanations on Vertex AI.

How Example-based Explanations can help

Imagine you have an image classification model which recognizes damaged car parts as described in this article. To understand why the model classifies an image as damaged bumper rather than a damaged lateral, you can use feature attribution-based explanations.

With feature attribution methods such as Integrated Gradients or XRAI, you can see which features (pixels) in the image contributed most to the model’s prediction. On Vertex AI, you can use Explainable AI which enables you to generate feature attributes or values for their model’s predictions as you can see below.

Notice how the model is most influenced not only by the presence of the damage but also by the presence of the damaged object in the image. Consequently, model performance might suffer if the damaged object is not completely captured in the image. This is like a good insight to help you understand model results.

But now, let’s say you collect new images of damaged bumpers which are taken from different perspectives . As you can see below, when you pass those images to your model, it fails in recognizing damaged car parts, even if the image includes both the damage and the damaged object.

In scenarios like the one described above, how would you explain model predictions? A possible approach would be looking at similar damaged bumpers images from the training data and see if you have images taken from different perspectives.

In case there is only one training bumper image of the misclassified bumper collected from a similar perspective, then you can say that your model fails to recognize car damages because of a lack of training images of the bumper taken from similar perspectives. So, there is room for improving model performance by gathering more data with images of car damages starting from images you are evaluating. In other words, now you know how to explain model results, improve the quality of your training data and consequently build a better model all in one fell swoop!

What is described is the core idea behind Example-based Explanations. Example-based Explanations are a type of ML explanation that uses examples from the training data to explain the behavior of a model. Starting from user examples, the method answers the following question: “What are data points similar in the training set and how do they affect the model?” Answering this question makes example-based explanations a complementary data driven approach to improve the quality of your data and the model performance of your models.

Now that you know how example-based explanations can help to explain model results and build better models, you may be wondering

How can I get example-based explanations?

To generate Example-based Explanations, you need a way to retrieve similar results based on your training data, your model and the prediction you want to explain. In other words, you need a similarity matching system where

you extract the encoder from the model to explain and you use it to generate the embeddings for your training dataset. These embeddings are indexed and stored.
you apply the same encoder to get the embedding for the image associated with your prediction request and search the index for its nearest neighbors.
you get back the most similar images so you use them to explain the model prediction as discussed above.

Although it may sound a bit complicated, you don’t have to build such an entire system by yourself. Vertex AI provides Example-based Explanations, a managed service to get example-based explanations for your models using the Vertex AI Python SDK. Let’s see how you can use it for getting example-based explanations.

Example-based Explanations on Vertex AI

To get example-based explanations on Vertex AI, you need to cover the steps in the following picture.

Notice that step 1 is required. That’s because Vertex Explainable Example-based uses ANN service (Vertex AI Matching Engine) for returning similar examples to new predictions/instances.

Step 1: Create the Index

To index the entire dataset, first of all you need to extract the encoder from your model. The encoder will be used to compress input images into lower-dimensional embedding vectors.

Below you can see how to extract the encoder from a Tensorflow model.

import tensorflow as tf

encoder_model = tf.keras.models.Sequential()
for layer in loaded_model.layers:
  if "enc" in layer.name:
    encoder_model.add(layer)

After you get the encoder model, you need to define a serving function to convert image data to the format your embedding expects. This will allow you to register the model in the Vertex AI Model Registry as a Model resource for creating the similarity index and successively deploy in an Vertex AI Endpoint for generating predictions and associated explanations. Below you can see that the export_encoder function gets the serving function.

import tensorflow as tf
img_height, img_width, channels = (28, 28, 3)

def export_model(model: tf.keras.Model, signature="predict"):

    m_call = tf.function(model.call).get_concrete_function(
        tf.TensorSpec(
            shape=(None, img_height, img_width, channels), dtype=tf.float32, name=concrete_input
        )
      )

    if signature == "predict":

      @tf.function(input_signature=[tf.TensorSpec([None], tf.string), tf.TensorSpec([None], tf.string)])
      def serving_fn(id, bytes_inputs):
        """
        This function is used to serve the prediction.
        Args:
          bytes_inputs: The input image.
        Returns:
          The output of the model.
        """
        images = preprocess_fn(bytes_inputs)
        prediction = m_call(**images)
        return {"id":id, "prediction": prediction}

      return serving_fn

    else:

      @tf.function(input_signature=[tf.TensorSpec([None], tf.string), tf.TensorSpec([None], tf.string)])
      def embedding_fn(id, bytes_inputs):
          """
          This function is used to serve the embeddings.
          Args:
              id: The id of the input.
              bytes_inputs: The input images in bytes.
          Returns:
              The output of the model.
          """
          images = preprocess_fn(bytes_inputs)
          embedding = m_call(**images)
          return {"id": id, "embedding": embedding}
    return embedding_fn

preprocess_fn implements preprocessing transformations images may need including decoding and resizing. Also notice that export_encoder defines the signature depending on if you want the serving function to return either an embedding with the associated id or the predicted class. With embeddings, the structure is required for creating and deploying the similarity index on Matching Engine for getting example-based explanations.

After you define the serving function, the following code in Tensorflow uploads the model to the Cloud storage bucket.

import tensorflow as tf 

tf.saved_model.save(
    model,
    embeddings_uri,
    signatures={
        "serving_default": export_model(model),
        "example_based_explanations" : export_model(embedding_model, 
                                        signature='explain'),
    },
)

Now that you get the encoder model, you need to prepare the training dataset in a way that the associated embedding space can be generated and indexed. According to the Vertex AI Example-based Explanations documentation, the training data file (using JSONL) would look like the following.

{"id": "0", "bytes_inputs": {"b64": "bytes string"}}, 
{"id": "1", "bytes_inputs": {"b64": "bytes string"}}, 
...
{"id": "n", "bytes_inputs": {"b64": "bytes string"}}

where each line is a valid JSON object which contains an image converted into a bytes string interpreted as a record with a unique identifier. Assuming that you have a Tensorflow dataset, you can generate the required data file and upload to the Cloud storage bucket as shown below.

with tf.io.gfile.GFile(train_json_file_uri, 'w') as f:
  for i, im in enumerate(train_images):
      instance = get_instance(i, im)
      json.dump(
          instance,
          f,
      )
      f.write("\n")

The get_instance function converts each image in a bytes string and creates an associated Python dictionary with id and bytes_inputs keys.

So far, you have both the encoder model and the training data file stored in a bucket. You are now ready to register the model in the Vertex AI Model Registry. When you register the model, you need to define model configuration which includes deployment and explanation specification. With Example-based Explanations, the specification is required to trigger the process which creates the index over the entire training datasets.

The Example-based Explanations specification is composed of parameters and metadata. About parameters, you need to provide examples which define the number of neighbors to return (neighbor_count) from the provided dataset (gcs_source). Below you have an example of parameters in the car damage classification case.

from google.cloud import aiplatform_v1 as vertex_ai_v1

dimensions = embedding_model.output.shape[1]

nearest_neighbor_search_config = {
    "contentsDeltaUri": "",
    "config": {
        "dimensions": dimensions,
        "approximateNeighborsCount": 10,
        "distanceMeasureType": "SQUARED_L2_DISTANCE",
        "featureNormType": "NONE",
        "algorithmConfig": {
            "treeAhConfig": {
                "leafNodeEmbeddingCount": 1000,
                "leafNodesToSearchPercent": 100,
            }
        },
    },
}

example_gcs_source = vertex_ai_v1.types.Examples.ExampleGcsSource(
    gcs_source=vertex_ai.compat.types.io_v1.GcsSource(uris=[train_images_file_uri]))

examples = vertex_ai.explain.Examples(
    nearest_neighbor_search_config=nearest_neighbor_search_config,
    example_gcs_source=example_gcs_source,
    neighbor_count=10
)

explanation_parameters = vertex_ai.explain.ExplanationParameters(examples=examples)

The nearest_neighbor_search_config contains the configuration of the index and it gives you full control on how to generate it. That configuration should match the NearestNeighborSearchConfig specification. It includes dimensions of the embedding (dimensions), the distance to measure nearness of examples (distanceMeasureType) and more. See the documentation if you would like to have a simplified configuration using Presets.

In metadata, you need to indicate inputs and outputs which map the names of id, encoded image and the resulting embedding served from the model with the explanation metadata. Below you can see how metadata would look like in the car damage classification case.

from google.cloud import aiplatform_v1 as vertex_ai_v1

image_input_tensor_name = "bytes_inputs"
id_input_tensor_name = "id"
output_tensor_name = "embedding"

explanation_inputs = {

          "my_input": vertex_ai.explain.ExplanationMetadata.InputMetadata(
              {
            "input_tensor_name": image_input_tensor_name,
            "encoding": vertex_ai.explain.ExplanationMetadata.InputMetadata.Encoding(1),
            "modality": "image",
              }
          ),
          "id": vertex_ai.explain.ExplanationMetadata.InputMetadata(
              {
            "input_tensor_name": id_input_tensor_name,
            "encoding": vertex_ai.explain.ExplanationMetadata.InputMetadata.Encoding(1),
              }
          )
}

explanation_outputs = {
    "embedding" : vertex_ai.explain.ExplanationMetadata.OutputMetadata(
        {"output_tensor_name": output_tensor_name}
    )
}

explanation_metadata = vertex_ai.explain.ExplanationMetadata(
    inputs=explanation_inputs, outputs=explanation_outputs, latent_space_source="example_based_explanations"
)

where latent_space_source indicates the source to generate embeddings for example based explanations. In this case, the signature of the serving function.

After you define Example-based Explanations parameters and metadata, you pass them as Explanation specification in the model configuration together with deployment specification and register the model in Vertex AI Model Registry.

from google.cloud import aiplatform_v1 as vertex_ai_v1

model_name = "cnn-damage-classification-similarity"
serving_container_image_uri = "us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-11:latest"

vertex_model_eba = vertex_ai.Model.upload(
    display_name=model_name,
    artifact_uri=embeddings_uri,
    serving_container_image_uri=serving_container_image_uri,
    explanation_parameters=explanation_parameters,
    explanation_metadata=explanation_metadata,
    sync=True,
    upload_request_timeout=timeout
)

Depending on the dataset size and the model dimension, registering the model may take long (more than one hour). This is because the system must generate and index the embeddings before example-based explanations can be used.

Step 2: Deploy the model

Once you register the model on Vertex AI Model Registry as described above, deploying the model and its example-based service is straightforward. You only need to create a Vertex AI Endpoint to host the model and specify the serving configuration (computational resources to use, traffic policy and so on). Below you can see an example of how to create an Endpoint and deploy the Model using Vertex AI Python SDK.

deployed_model_name = "car_damage_detection_eba_endpoint"
machine_type = "n1-standard-4"

endpoint = vertex_model_eba.deploy(
    deployed_model_display_name=deployed_model_name,
    machine_type=machine_type,
    accelerator_type=None,
    accelerator_count=0,
)

As for uploading the model, deploying the model itself would take some time depending on the model dimension and provisioned resources.

Step 3: Request Example-based explanations

At this point, your endpoint is ready to provide you with example-based explanations. To get prediction and example-based explanations, you can use Vertex AI Python SDK as you can see below.

response = endpoint.explain(instances=instances)

where instances is a list containing images converted into a bytes string. Notice that each response contains both prediction and explanation. Here you have an example of the response.

{
'explanations': [
    {'neighbors': [
{'neighborId': '21','neighborDistance': 0.43712},
...
  {'neighborId': '5', 'neighborDistance': 0.95008}
       ]
    }],
 'predictions': [
    {prediction: [0.93895,...,     0.53034], 'id': '0'}
    ]
}

Then you need to parse the response and extract example-based explanations. And after you get explanations, you can combine them with predicted labels and explain your results as discussed above.

Conclusion

Example-based explanations are a type of ML explanation that uses examples from the training data to explain the behavior of a model. It allows you to overcome some of the challenges of feature attributions related to data quality and prior technical knowledge.

Vertex AI provides Example-based Explanation, a managed service to get example-based explanations for your models using the Vertex AI Python SDK. It is a very promising service for scaling example-based explanations. However, there are some areas where it could be improved. For example, the index creation process could be made more efficient, and the logs could be more informative in the event of a failure. Additionally, it would be helpful to explore how to integrate example-based explanations with the model prediction endpoint, as is done with feature attribution explanations.

With that being said, I would personally recommend adding example-based explanations in your explainable AI toolkit. Depending on the model application and your users, example-based explanations can help to build better models with a solid data driven approach which is easier to understand for non-technical audiences.

Are you going to use Example-based Explanations on Vertex AI? If so, let me know more on LinkedIn or Twitter. In the meantime, I hope you found the article interesting. If so, clap or leave your comments.

Big kudos to Irina Sigler, Sheeraz Ahmad, Siping Hu, Mark Lu, Eric Dong and all the XAI team for making this happen and for all their feedback.