Published in

TruLens

# What is the Integrated Gradients method?

Integrated Gradients was proposed by M. Sundararajan, A. Taly, Q. Yan in Axiomatic Attribution for Deep Networks. The equation to compute the Integrated Gradients attribution for an input record x and a baseline record x’ is as follows:

## Axioms

A handful of desirable axioms are satisfied by IG that are outlined in the paper [1]. We will highlight two of those axioms that are particularly important for an explanation method:

1. The completeness axiom
2. The sensitivity axiom

# Method comparisons

It is worth comparing and considering the fundamental differences between different explanation methods and the pros and cons of using IG versus other common explainability methods.

## Integrated Gradients and Saliency Maps

Let’s first compare IG to another common gradient explainer: saliency maps. In the simplest sense, saliency maps are the gradients of the input features of a neural network with respect to a final output. This explanation highlights the features that are the most reactive and likely to quickly change the output, but only makes sense for small deviations away from the original input. The completeness axiom of IG gives a stronger (and more complete) representation of what went into the output. This is because the gradients received for saliency maps are generally in the model’s saturated region of gradients.

## Integrated Gradients & The Shapley Value

The best reason to compare IG and Shapley-based methods like QII [2] is because both of these methods give explanations that can be used both locally and globally as they satisfy analogous axioms. The completeness and sensitivity axioms of IG are analogous to the efficiency and dummy axioms referenced in our blog: The Shapley Value for ML Models

The rest of this blog will highlight the many choices to make if using IG in a practical setting. When using IG in practice, there are quite a few choices that a practitioner can make, and flexibility is essential. The following article will use an open-source gradient explanation library called TruLens to showcase how IG can be used in practice. The code examples will reference Distribution of Interest (DoI), Quantity of Interest (QoI), and a custom InternalInfluence method which are the building blocks of TruLens outlined in: A Hands-on Introduction to Explaining Neural Networks with TruLens.

## Translating Integrated Gradients to Code

In the previous sections IG was defined as a function in the continuous space, but we also highlighted that an estimation can be done by discrete partitioning of the straight line interpolation.

`from trulens.nn.attribution import IntegratedGradientsfrom trulens.visualizations import MaskVisualizer# Create the attribution measure.ig_computer = IntegratedGradients(model, resolution=10)# Calculate the input attributions.input_attributions = ig_computer.attributions(beagle_bike_input)# Visualize the attributions as a mask on the original image.visualizer = MaskVisualizer(blur=10, threshold=0.95)visualization = visualizer(input_attributions, beagle_bike_input)`
`from trulens.nn.distributions import DoIclass MyCustomDoI(DoI):    def __call__(self, z):        ...def get_activation_multiplier(self, activation):        ...from trulens.nn.attribution import InternalInfluence from trulens.nn.quantities import MaxClassQoIfrom trulens.nn.slices import InputCut, OutputCut, Slice# Define a custom influence measureinfl = InternalInfluence(model,     Slice(InputCut(), OutputCut()),    MaxClassQoI(),    MyCustomDoI())`

## Customization of Output Function

Another customization area of Integrated Gradients is the output F to be explained. You may choose to explain the logit vs probit layer. The reasons to explain either would be dependent on the final use case. Model probabilities would give exact contribution to score, whereas logits might give better comparative scores to records that are closer in the near 1 or 0 regions of the probit space that would be squashed by the sigmoid function. The InternalInfluence method in TruLens lets you choose any output layer via Slice and Cut objects.

`from trulens.nn.attribution import InternalInfluence from trulens.nn.quantities import MaxClassQoIfrom trulens.nn.distributions import LinearDoifrom trulens.nn.slices import InputCut, Cut, Slice# Get the layer name to explain from the model summarymodel.summary() # you may be able to also see from print(model)layer_to_explain = 'logits'# Define a custom influence measureinfl = InternalInfluence(model,     Slice(InputCut(), Cut(layer_to_explain)),    MaxClassQoI(),    LinearDoi())`
`from trulens.nn.quantities import QoIclass MyCustomQoI(QoI):    def __init__(...):        ...    def __call__(self, y):                 ...from trulens.nn.attribution import InternalInfluence from trulens.nn.distributions import LinearDoifrom trulens.nn.slices import InputCut, OutputCut, Slice# Define a custom influence measureinfl = InternalInfluence(model,     Slice(InputCut(), OutputCut()),    MyCustomQoI(),    LinearDoi())`

## Customization of Baseline

The last topic of this blog, but probably the most important consideration in IG, is the choice of the baseline. In the image domain, the most prevalent baseline used in literature is the empty image. Semantically, it is a very intuitive baseline: the final attribution scores will be the attribution difference of the image itself minus a presumably informationless baseline.

# References

[1] M. Sundararajan, A. Taly, Q. Yan, Axiomatic Attribution for Deep Networks (2017), Proceedings of the 34th International Conference on
Machine Learning, volume 70 of Proceedings of Machine Learning Research

--

--

## More from TruLens

TruLens provides explainability for neural network machine learning models. We’re building a community of developers that are driving AI forward.

## Get the Medium app

A leader in developing solutions connecting machine learning with production data science teams