A Hands-on Introduction to Explaining Neural Networks with TruLens

Klas Leino
TruLens
Published in
10 min readAug 10, 2021

Transparency and Attribution Methods

In recent years, deep learning has become increasingly powerful and ubiquitous; while at the same time, its inner workings are far from well-understood. As the application of ML has increased, so has the need for algorithmic transparency, the ability to understand why algorithms deployed in the real world make the decisions they do. One broad technique stemming from the latest research seeking to address this concern involves the use of attribution methods to help explain the behavior of ML models.

Attribution methods, in the most general sense, allow us to quantify the contribution of particular variables of a model towards a particular model behavior. For example, attribution methods can measure the effect each input variable has on the output of a deep network.

Enter TruLens

TruLens is a flexible, extensible, easy-to-use library for attribution-based explanations for deep neural networks. TruLens works uniformly across popular model development frameworks — Keras, PyTorch, TensorFlow — and supports widely-used methods, such as saliency maps, integrated gradients, internal influence, and smooth gradients.

In this article, we take a hands-on look at some of the basics of TruLens, demonstrating how TruLens makes exploring the inner-workings of deep networks quick and easy. An introductory notebook is available if you’d like to follow along interactively. To learn more about the broad capabilities of this unique library see this blog post, “Peer Deep into Neural Networks with TruLens.”

The Basics: Model Wrappers

In order to support a wide variety of backends with different interfaces for their respective models, TruLens uses its own ModelWrapper class which provides a general model interface to simplify the implementation of the TruLens API. To get the model wrapper, use the get_model_wrapper method in trulens.nn.models. A different model wrapper class exists for each backend to convert models in the respective backend's format to the general TruLens ModelWrapper interface. Any model defined using Keras, PyTorch, or TensorFlow should be wrapped before being used with the other API functions that require a model—all other TruLens functionalities expect models to be an instance of trulens.nn.models.ModelWrapper.

For more details on wrapping models, see the get_model_wrapper documentation.

For our introduction, we will consider a pre-trained ImageNet model, which can be wrapped with the code below:

# Tensorflow/Kerasfrom tensorflow.keras.applications.vgg16 import VGG16
from trulens.nn.models import get_model_wrapper
keras_model = VGG16(weights='imagenet')# Produce a wrapped model from the keras model.
model = get_model_wrapper(keras_model)
# Pytorchfrom torchvision.models import vgg16
from trulens.nn.models import get_model_wrapper
pytorch_model = vgg16(pretrained=True)# Produce a wrapped model from the pytorch model.
model = get_model_wrapper(
pytorch_model, input_shape=(3,224,224), device='cpu')

Once we’ve created our wrapped model, we can use it with the TruLens API to calculate attributions.

Input Attributions: Saliency Maps and Integrated Gradients

Attribution methods extend the AttributionMethod class, and many concrete instances are found in the trulens.nn.attribution module. Once an attribution method has been instantiated, its main function is its attributions method, which takes an input to the model that we’d like to compute attributions for. In the simplest case, this input is an np.array of batched records, and the computed attributions will match the input shape.

The most straightforward attributions to compute are those that directly measure the importance of input features on the classification outcome of the model. We will begin by looking at two methods that are simple to use with the TruLens API:

  • Saliency maps (Simonyan et al.) take the gradient of the network’s output at the predicted class with respect to its input at a given point.
  • Integrated gradients (Sundararajan et al.) addresses some potential shortcomings of saliency maps by aggregating the gradient over a line in the model’s input space that interpolates from a selected baseline to a given point. In practice, this is done by averaging the gradient taken at uniformly-spaced samples along the line.

Saliency Maps

For an example input image, x (also referred to as beagle_bike_input in this example), calculating and visualizing saliency maps with TruLens is straightforward using the InputAttribution class:

from trulens.nn.attribution import InputAttribution
from trulens.visualizations import MaskVisualizer
beagle_bike_input = x# Create the attribution measure.
saliency_map_computer = InputAttribution(model)
# Calculate the input attributions.
input_attributions = saliency_map_computer.attributions(
beagle_bike_input)
# Visualize the attributions as a mask on the original image.
visualizer = MaskVisualizer(blur=10, threshold=0.95)
visualization = visualizer(input_attributions, beagle_bike_input)
Saliency map showing the image areas of a beagle picture found to be relevant by the algorithm
Visualized Beagle Class Explanation : Saliency Map

First, we instantiate our AttributionMethod object. Saliency maps are implemented by the InputAttribution class, which acts on a wrapped model. After constructing the attribution method, we call its attributions method on our data point to receive an array containing the input attributions.

Finally, to visualize the attributions, we can use a MaskVisualizer from the trulens.visualizations module. A MaskVisualizer allows us to visualize our input attributions by overlaying a partially-opaque mask over our original image that reveals only the pixels whose normalized attributions are over a given threshold. The MaskVisualizer takes a few optional parameters to aid in producing interpretable visualizations. The blur parameter applies a Gaussian blur to the attributions before applying the threshold, in order to highlight generally influential regions rather than specific pixels. The threshold parameter adjusts the threshold used for the mask; a higher threshold leads to a more focused visualization, while a lower threshold will highlight the image more liberally.

Integrated Gradients

Calculating and visualizing Integrated Gradients is just as simple, using the IntegratedGradients class. The code is nearly identical to that for saliency maps, except that we use the IntegratedGradients class, which takes an optional resolution parameter that specifies the number of gradient samples to integrate.

from trulens.nn.attribution import IntegratedGradients
from trulens.visualizations import MaskVisualizer
# Create the attribution measure.
ig_computer = IntegratedGradients(model, resolution=10)
# Calculate the input attributions.
input_attributions = ig_computer.attributions(beagle_bike_input)
# Visualize the attributions as a mask on the original image.
visualizer = MaskVisualizer(blur=10, threshold=0.95)
visualization = visualizer(input_attributions, beagle_bike_input)
Integrated gradients used in a saliency map of a beagle image.
Visualized Beagle Class Explanation : Integrated Gradients

Generalizing Beyond Input Attributions: Attribution Flexibility

To conclude this introduction, we’ll examine Internal Influence (Leino et al.), a powerful and general attribution method that admits fine-grain explanation control. For example, it can calculate attributions not only for a network’s inputs, but for any internal neuron within the network. Internal Influence is implemented by the InternalInfluence class in the trulens.nn.attribution module, which is used in a similar fashion to InputAttribution and IntegratedGradients (above).

The InternalInfluence constructor takes a TruLens ModelWrapper and three special parameters: a slice, a quantity of interest (QoI), and a distribution of interest (DoI). These are instantiated with the Slice (in the trulens.nn.slices module), QoI (in the trulens.nn.quantities module), and DoI (in the trulens.nn.distributions module) classes, respectively.

The slice essentially defines a layer to use for internal attributions. A Slice object specifies two "cuts" (trulens.nn.slices.Cut; documentation) corresponding to two layers: (1) the layer of the variables that we are calculating attribution for (e.g., the input layer or an internal layer), and (2) the layer whose output defines our quantity of interest (e.g., the output layer or a specific neuron in the internal layer; see below for more on quantities of interest). The shape of the attributions will always match the shape of the first cut.

The quantity of interest (QoI) essentially defines the model behavior we would like to explain using attributions. More specifically, the QoI is a function of the model’s output at some layer (specified by the second cut). For example, a QoI may select the output corresponding to a particular class, in which case our attributions would be interpreted as highlighting the features the model uses as evidence for choosing that class. In its most general form, the QoI can be specified by an implementation of the QoI class in the trulens.nn.quantities module; however, several common default implementations are provided in this module as well.

The distribution of interest (DoI) specifies points surrounding each record for which the calculated attribution should be faithful. The distribution can be specified via an implementation of the DoI class in the trulens.nn.distributions module, which is a function taking an input record and producing a list of input points to aggregate attribution over. A few common default distributions implementing the DoI class can be found in the distributions module. This effectively generalizes the most common attribution types. Integrated Gradients utilizes a linear interpolation DoI; Smooth Gradients utilizes a gaussian distributed DoI. You can also easily create your own DoI by extending the base class.

The parameters to Internal Influence allow us to formulate a wide range of general queries to probe an otherwise-opaque deep network’s behavior. In fact, under the hood, TruLens implements all attribution methods using Internal Influence. For example, IntegratedGradients can be expressed using InternalInfluence in the following way:

from trulens.nn.attribution import InternalInfluence
from trulens.nn.distributions import LinearDoi
from trulens.nn.quantities import MaxClassQoI
from trulens.nn.slices import InputCut, OutputCut, Slice
# Create the attribution measure.
ig_computer = InternalInfluence(
model,
Slice(InputCut(), OutputCut()),
MaxClassQoI(),
LinearDoi(resolution=10))
# Calculate the input attributions.
input_attributions = ig_computer.attributions(beagle_bike_input)

InputCut and OutputCut are two special cuts that represent the first and last layer of a network, respectively. The MaxClassQoI selects the output corresponding to the class that the network predicted on the given input. LinearDoi is a DoI that corresponds to the linear interpolation performed by Integrated Gradients.

Internal Influence

We will now demonstrate an example that goes beyond the functionality supported by saliency maps or Integrated Gradients. For this example, we will be calculating attributions for the feature maps in the layer of our model named 'block4_conv3'. First, we are interested in explaining the model's predicted class for our record. We specify this by using a MaxClassQoI, which sets the attributions to explain the model's output for its highest-confidence class. We will use the PointDoI, which specifies that we are only concerned with the model's behavior on one particular point, i.e., we want a very local explanation.

from trulens.nn.attribution import InternalInfluence
from trulens.nn.distributions import PointDoi
from trulens.nn.quantities import MaxClassQoI
from trulens.nn.slices import Cut, OutputCut, Slice
# Create the attribution measure.
internal_infl_computer = InternalInfluence(
model,
Slice(Cut('block4_conv3'), OutputCut()),
MaxClassQoI(),
PointDoi())
# Get the attributions for the internal neurons at the
# 'block4_conv3' layer. Because 'block4_conv3' contains 2D feature
# maps, we take the sum over the width and height of the feature
# maps to obtain a single attribution for each feature map.
internal_attrs = internal_infl_computer.attributions(
beagle_bike_input).sum(axis=(1,2))

Note that above we used the Slice, MaxClassQoI, and PointDoI classes to define the slice, QoI, and DoI. The TruLens API also offers several simple shorthands for defining these parameters more simply. For example, the above code could be more succinctly written as

from trulens.nn.attribution import InternalInfluence# Create the attribution measure.
internal_infl_computer = InternalInfluence(
model, 'block4_conv3', 'max', 'point')
# Get the attributions for the internal neurons at the
# 'block4_conv3' layer. Because 'block4_conv3' contains 2D feature
# maps, we take the sum over the width and height of the feature
# maps to obtain a single attribution for each feature map.
internal_attrs = internal_influence_computer.attributions(
beagle_bike_input).sum(axis=(1,2))

Now we can calculate the most important feature map towards the model’s top prediction, by taking the argmax over the internal attributions for this record. The most important feature map represents some type of learned feature that was the most important in the network’s decision to label this point as 'beagle'.

top_feature_map = int(internal_attrs[0].argmax())

Visualizing Important Internal Neurons

We would now like to visualize our identified feature map in a meaningful way. Since the feature map represents a learned feature, which is not readily interpretable, we will use a second set of attributions to identify the input features that are most important in defining this particular feature map. We will then use a visualizer to visualize these input features that relate to our identified important feature map. The code for this procedure could look like the following:

from trulens.nn.attribution import InternalInfluence
from trulens.visualizations import MaskVisualizer
input_infl_computer = InternalInfluence(
model, (0, 'block4_conv3'), top_feature_map, 'point')
# The above is shorthand for:
# infl_input = InternalInfluence(
# model,
# (InputCut(), Cut('block4_conv3')),
# InternalChannelQoI(top_feature_map),
# PointDoi())
input_attributions = input_infl_computer.attributions(
beagle_bike_input)
# Visualize the attributions as a mask on the original image.
visualizer = MaskVisualizer(blur=10, threshold=0.95)
visualization = visualizer(input_attributions, beagle_bike_input)
Internal neuron contribution to a saliency map of a beagle image.
Visualized Beagle Class Explanation : Internal Neuron

The above procedure — using a second set of attributions to identify the input features that are most important in defining a particular feature map, and then using a visualizer on the resulting input attributions — is a common use case when dealing with internal attributions. This procedure can instead be done via a single step, using a ChannelMaskVisualizer also found in the trulens.visualizations module, demonstrated below.

from trulens.visualizations import ChannelMaskVisualizervisualizer = ChannelMaskVisualizer(
model,
'block4_conv3',
top_feature_map,
blur=10,
threshold=0.95)
visualization = visualizer(beagle_bike_input)

Other Quantities of Interest

We can also change the quantity that we want the attributions to explain. For example, our example image contains both a bike and a dog. Recall that while the top class predicted by our model was 'beagle', ImageNet also contains bike-related classes, e.g., 'mountain bike, all-terrain bike, off-roader' (class 249). We will use the ClassQoI to view the attributions towards the class 'mountain bike, all-terrain bike, off-roader'.

from trulens.nn.attribution import InternalInfluence
from trulens.visualizations import ChannelMaskVisualizer
BIKE_CLASS = 249# Create the attribution measure.
internal_infl_computer = InternalInfluence(
model, 'block4_conv3', BIKE_CLASS, 'point')
# The above is shorthand for
#
# infl_bike = InternalInfluence(
# model,
# Slice(Cut('block4_conv3'), OutputCut()),
# ClassQoI(BIKE_CLASS),
# 'point')
# Get the attributions for each feature map.
internal_attrs = internal_infl_computer.attributions(
beagle_bike_input).sum(axis=(1,2))
# Find the index of the top feature map.
top_feature_map_bike = int(internal_attrs[0].argmax())
# Visualize the top feature map in the input space.
visualizer = ChannelMaskVisualizer(
model,
'block4_conv3',
top_feature_map_bike,
blur=10,
threshold=0.95)
visualization = visualizer(beagle_bike_input)
Internal neuron contribution to identify the bicycle parts of an image that contain both a bicycle and a beagle.
Visualized Bike Class Explanation : Internal Neuron

References

  1. Leino et al. “Influence-directed Explanations for Deep Convolutional Networks.” ITC 2018. arXiv
  2. Simonyan et al. “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps.” 2014. arXiv
  3. Sundararajan et al. “Axiomatic Attribution for Deep Networks.” ICML 2017. arXiv

--

--

Klas Leino
TruLens
Writer for

Klas received his PhD at CMU studying the weaknesses and vulnerabilities of deep learning; he works to improve DNN security, transparency, and privacy