TruLens
Published in

TruLens

Deep Dive into Neural Network Explanations with Integrated Gradients

A Practitioner’s Guide

Original Image by StockSnap on Pixabay and edited images by author | Left: Original image | Middle: Integrated Gradients explanation of beagle | Right: Integrated Gradients explanation of mountain bike

What is the Integrated Gradients method?

Integrated Gradients was proposed by M. Sundararajan, A. Taly, Q. Yan in Axiomatic Attribution for Deep Networks. The equation to compute the Integrated Gradients attribution for an input record x and a baseline record x’ is as follows:

Axioms

A handful of desirable axioms are satisfied by IG that are outlined in the paper [1]. We will highlight two of those axioms that are particularly important for an explanation method:

  1. The completeness axiom
  2. The sensitivity axiom

Method comparisons

It is worth comparing and considering the fundamental differences between different explanation methods and the pros and cons of using IG versus other common explainability methods.

Integrated Gradients and Saliency Maps

Let’s first compare IG to another common gradient explainer: saliency maps. In the simplest sense, saliency maps are the gradients of the input features of a neural network with respect to a final output. This explanation highlights the features that are the most reactive and likely to quickly change the output, but only makes sense for small deviations away from the original input. The completeness axiom of IG gives a stronger (and more complete) representation of what went into the output. This is because the gradients received for saliency maps are generally in the model’s saturated region of gradients.

Image by Author, inspired by Ankur Taly’s Explaining Machine Learning Models — Stanford CS Theory

Integrated Gradients & The Shapley Value

The best reason to compare IG and Shapley-based methods like QII [2] is because both of these methods give explanations that can be used both locally and globally as they satisfy analogous axioms. The completeness and sensitivity axioms of IG are analogous to the efficiency and dummy axioms referenced in our blog: The Shapley Value for ML Models

Integrated Gradients in Practice

The rest of this blog will highlight the many choices to make if using IG in a practical setting. When using IG in practice, there are quite a few choices that a practitioner can make, and flexibility is essential. The following article will use an open-source gradient explanation library called TruLens to showcase how IG can be used in practice. The code examples will reference Distribution of Interest (DoI), Quantity of Interest (QoI), and a custom InternalInfluence method which are the building blocks of TruLens outlined in: A Hands-on Introduction to Explaining Neural Networks with TruLens.

Translating Integrated Gradients to Code

In the previous sections IG was defined as a function in the continuous space, but we also highlighted that an estimation can be done by discrete partitioning of the straight line interpolation.

from trulens.nn.attribution import IntegratedGradients
from trulens.visualizations import MaskVisualizer
# Create the attribution measure.
ig_computer = IntegratedGradients(model, resolution=10)
# Calculate the input attributions.
input_attributions = ig_computer.attributions(beagle_bike_input)
# Visualize the attributions as a mask on the original image.
visualizer = MaskVisualizer(blur=10, threshold=0.95)
visualization = visualizer(input_attributions, beagle_bike_input)
Original Image by StockSnap on Pixabay and edited by author | Visualized Beagle Class Explanation: Integrated Gradients
from trulens.nn.distributions import DoIclass MyCustomDoI(DoI):
def __call__(self, z):
...
def get_activation_multiplier(self, activation):
...
from trulens.nn.attribution import InternalInfluence
from trulens.nn.quantities import MaxClassQoI
from trulens.nn.slices import InputCut, OutputCut, Slice
# Define a custom influence measure
infl = InternalInfluence(model,
Slice(InputCut(), OutputCut()),
MaxClassQoI(),
MyCustomDoI())

Customization of Output Function

Another customization area of Integrated Gradients is the output F to be explained. You may choose to explain the logit vs probit layer. The reasons to explain either would be dependent on the final use case. Model probabilities would give exact contribution to score, whereas logits might give better comparative scores to records that are closer in the near 1 or 0 regions of the probit space that would be squashed by the sigmoid function. The InternalInfluence method in TruLens lets you choose any output layer via Slice and Cut objects.

from trulens.nn.attribution import InternalInfluence 
from trulens.nn.quantities import MaxClassQoI
from trulens.nn.distributions import LinearDoi
from trulens.nn.slices import InputCut, Cut, Slice
# Get the layer name to explain from the model summary
model.summary() # you may be able to also see from print(model)
layer_to_explain = 'logits'
# Define a custom influence measure
infl = InternalInfluence(model,
Slice(InputCut(), Cut(layer_to_explain)),
MaxClassQoI(),
LinearDoi())
Image from Influence-Directed Explanations for Deep Convolutional Networks [3] reposted with permission Internal attribution features of a car
Image from Influence-Directed Explanations for Deep Convolutional Networks [3] reposted with permission Internal attribution features of a car vs convertible
from trulens.nn.quantities import QoIclass MyCustomQoI(QoI):
def __init__(...):
...
def __call__(self, y):
...
from trulens.nn.attribution import InternalInfluence
from trulens.nn.distributions import LinearDoi
from trulens.nn.slices import InputCut, OutputCut, Slice
# Define a custom influence measure
infl = InternalInfluence(model,
Slice(InputCut(), OutputCut()),
MyCustomQoI(),
LinearDoi())

Customization of Baseline

The last topic of this blog, but probably the most important consideration in IG, is the choice of the baseline. In the image domain, the most prevalent baseline used in literature is the empty image. Semantically, it is a very intuitive baseline: the final attribution scores will be the attribution difference of the image itself minus a presumably informationless baseline.

References

[1] M. Sundararajan, A. Taly, Q. Yan, Axiomatic Attribution for Deep Networks (2017), Proceedings of the 34th International Conference on
Machine Learning, volume 70 of Proceedings of Machine Learning Research

--

--

TruLens provides explainability for neural network machine learning models. We’re building a community of developers that are driving AI forward.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Rick Shih

A leader in developing solutions connecting machine learning with production data science teams