A Hands-on Introduction to Explaining Neural Networks with TruLens
Transparency and Attribution Methods
In recent years, deep learning has become increasingly powerful and ubiquitous; while at the same time, its inner workings are far from well-understood. As the application of ML has increased, so has the need for algorithmic transparency, the ability to understand why algorithms deployed in the real world make the decisions they do. One broad technique stemming from the latest research seeking to address this concern involves the use of attribution methods to help explain the behavior of ML models.
Attribution methods, in the most general sense, allow us to quantify the contribution of particular variables of a model towards a particular model behavior. For example, attribution methods can measure the effect each input variable has on the output of a deep network.
Enter TruLens
TruLens is a flexible, extensible, easy-to-use library for attribution-based explanations for deep neural networks. TruLens works uniformly across popular model development frameworks — Keras, PyTorch, TensorFlow — and supports widely-used methods, such as saliency maps, integrated gradients, internal influence, and smooth gradients.
In this article, we take a hands-on look at some of the basics of TruLens, demonstrating how TruLens makes exploring the inner-workings of deep networks quick and easy. An introductory notebook is available if you’d like to follow along interactively. To learn more about the broad capabilities of this unique library see this blog post, “Peer Deep into Neural Networks with TruLens.”
The Basics: Model Wrappers
In order to support a wide variety of backends with different interfaces for their respective models, TruLens uses its own ModelWrapper
class which provides a general model interface to simplify the implementation of the TruLens API. To get the model wrapper, use the get_model_wrapper method
in trulens.nn.models
. A different model wrapper class exists for each backend to convert models in the respective backend's format to the general TruLens ModelWrapper
interface. Any model defined using Keras, PyTorch, or TensorFlow should be wrapped before being used with the other API functions that require a model—all other TruLens functionalities expect models to be an instance of trulens.nn.models.ModelWrapper
.
For more details on wrapping models, see the get_model_wrapper
documentation.
For our introduction, we will consider a pre-trained ImageNet model, which can be wrapped with the code below:
# Tensorflow/Kerasfrom tensorflow.keras.applications.vgg16 import VGG16
from trulens.nn.models import get_model_wrapperkeras_model = VGG16(weights='imagenet')# Produce a wrapped model from the keras model.
model = get_model_wrapper(keras_model)
# Pytorchfrom torchvision.models import vgg16
from trulens.nn.models import get_model_wrapperpytorch_model = vgg16(pretrained=True)# Produce a wrapped model from the pytorch model.
model = get_model_wrapper(
pytorch_model, input_shape=(3,224,224), device='cpu')
Once we’ve created our wrapped model, we can use it with the TruLens API to calculate attributions.
Input Attributions: Saliency Maps and Integrated Gradients
Attribution methods extend the AttributionMethod
class, and many concrete instances are found in the trulens.nn.attribution
module. Once an attribution method has been instantiated, its main function is its attributions
method, which takes an input to the model that we’d like to compute attributions for. In the simplest case, this input is an np.array
of batched records, and the computed attributions will match the input shape.
The most straightforward attributions to compute are those that directly measure the importance of input features on the classification outcome of the model. We will begin by looking at two methods that are simple to use with the TruLens API:
- Saliency maps (Simonyan et al.) take the gradient of the network’s output at the predicted class with respect to its input at a given point.
- Integrated gradients (Sundararajan et al.) addresses some potential shortcomings of saliency maps by aggregating the gradient over a line in the model’s input space that interpolates from a selected baseline to a given point. In practice, this is done by averaging the gradient taken at uniformly-spaced samples along the line.
Saliency Maps
For an example input image, x
(also referred to as beagle_bike_input
in this example), calculating and visualizing saliency maps with TruLens is straightforward using the InputAttribution
class:
from trulens.nn.attribution import InputAttribution
from trulens.visualizations import MaskVisualizerbeagle_bike_input = x# Create the attribution measure.
saliency_map_computer = InputAttribution(model)# Calculate the input attributions.
input_attributions = saliency_map_computer.attributions(
beagle_bike_input)# Visualize the attributions as a mask on the original image.
visualizer = MaskVisualizer(blur=10, threshold=0.95)
visualization = visualizer(input_attributions, beagle_bike_input)
First, we instantiate our AttributionMethod
object. Saliency maps are implemented by the InputAttribution
class, which acts on a wrapped model. After constructing the attribution method, we call its attributions
method on our data point to receive an array containing the input attributions.
Finally, to visualize the attributions, we can use a MaskVisualizer
from the trulens.visualizations
module. A MaskVisualizer
allows us to visualize our input attributions by overlaying a partially-opaque mask over our original image that reveals only the pixels whose normalized attributions are over a given threshold. The MaskVisualizer
takes a few optional parameters to aid in producing interpretable visualizations. The blur
parameter applies a Gaussian blur to the attributions before applying the threshold, in order to highlight generally influential regions rather than specific pixels. The threshold
parameter adjusts the threshold used for the mask; a higher threshold leads to a more focused visualization, while a lower threshold will highlight the image more liberally.
Integrated Gradients
Calculating and visualizing Integrated Gradients is just as simple, using the IntegratedGradients
class. The code is nearly identical to that for saliency maps, except that we use the IntegratedGradients
class, which takes an optional resolution
parameter that specifies the number of gradient samples to integrate.
from trulens.nn.attribution import IntegratedGradients
from trulens.visualizations import MaskVisualizer# Create the attribution measure.
ig_computer = IntegratedGradients(model, resolution=10)# Calculate the input attributions.
input_attributions = ig_computer.attributions(beagle_bike_input)# Visualize the attributions as a mask on the original image.
visualizer = MaskVisualizer(blur=10, threshold=0.95)
visualization = visualizer(input_attributions, beagle_bike_input)
Generalizing Beyond Input Attributions: Attribution Flexibility
To conclude this introduction, we’ll examine Internal Influence (Leino et al.), a powerful and general attribution method that admits fine-grain explanation control. For example, it can calculate attributions not only for a network’s inputs, but for any internal neuron within the network. Internal Influence is implemented by the InternalInfluence
class in the trulens.nn.attribution
module, which is used in a similar fashion to InputAttribution
and IntegratedGradients
(above).
The InternalInfluence
constructor takes a TruLens ModelWrapper
and three special parameters: a slice, a quantity of interest (QoI), and a distribution of interest (DoI). These are instantiated with the Slice
(in the trulens.nn.slices
module), QoI
(in the trulens.nn.quantities
module), and DoI
(in the trulens.nn.distributions
module) classes, respectively.
The slice essentially defines a layer to use for internal attributions. A Slice
object specifies two "cuts" (trulens.nn.slices.Cut
; documentation) corresponding to two layers: (1) the layer of the variables that we are calculating attribution for (e.g., the input layer or an internal layer), and (2) the layer whose output defines our quantity of interest (e.g., the output layer or a specific neuron in the internal layer; see below for more on quantities of interest). The shape of the attributions will always match the shape of the first cut.
The quantity of interest (QoI) essentially defines the model behavior we would like to explain using attributions. More specifically, the QoI is a function of the model’s output at some layer (specified by the second cut). For example, a QoI may select the output corresponding to a particular class, in which case our attributions would be interpreted as highlighting the features the model uses as evidence for choosing that class. In its most general form, the QoI can be specified by an implementation of the QoI
class in the trulens.nn.quantities
module; however, several common default implementations are provided in this module as well.
The distribution of interest (DoI) specifies points surrounding each record for which the calculated attribution should be faithful. The distribution can be specified via an implementation of the DoI
class in the trulens.nn.distributions
module, which is a function taking an input record and producing a list of input points to aggregate attribution over. A few common default distributions implementing the DoI
class can be found in the distributions
module. This effectively generalizes the most common attribution types. Integrated Gradients
utilizes a linear interpolation DoI
; Smooth Gradients utilizes a gaussian distributed DoI
. You can also easily create your own DoI
by extending the base class.
The parameters to Internal Influence allow us to formulate a wide range of general queries to probe an otherwise-opaque deep network’s behavior. In fact, under the hood, TruLens implements all attribution methods using Internal Influence. For example, IntegratedGradients
can be expressed using InternalInfluence
in the following way:
from trulens.nn.attribution import InternalInfluence
from trulens.nn.distributions import LinearDoi
from trulens.nn.quantities import MaxClassQoI
from trulens.nn.slices import InputCut, OutputCut, Slice# Create the attribution measure.
ig_computer = InternalInfluence(
model,
Slice(InputCut(), OutputCut()),
MaxClassQoI(),
LinearDoi(resolution=10))# Calculate the input attributions.
input_attributions = ig_computer.attributions(beagle_bike_input)
InputCut
and OutputCut
are two special cuts that represent the first and last layer of a network, respectively. The MaxClassQoI
selects the output corresponding to the class that the network predicted on the given input. LinearDoi
is a DoI that corresponds to the linear interpolation performed by Integrated Gradients.
Internal Influence
We will now demonstrate an example that goes beyond the functionality supported by saliency maps or Integrated Gradients. For this example, we will be calculating attributions for the feature maps in the layer of our model named 'block4_conv3'
. First, we are interested in explaining the model's predicted class for our record. We specify this by using a MaxClassQoI
, which sets the attributions to explain the model's output for its highest-confidence class. We will use the PointDoI
, which specifies that we are only concerned with the model's behavior on one particular point, i.e., we want a very local explanation.
from trulens.nn.attribution import InternalInfluence
from trulens.nn.distributions import PointDoi
from trulens.nn.quantities import MaxClassQoI
from trulens.nn.slices import Cut, OutputCut, Slice# Create the attribution measure.
internal_infl_computer = InternalInfluence(
model,
Slice(Cut('block4_conv3'), OutputCut()),
MaxClassQoI(),
PointDoi())# Get the attributions for the internal neurons at the
# 'block4_conv3' layer. Because 'block4_conv3' contains 2D feature
# maps, we take the sum over the width and height of the feature
# maps to obtain a single attribution for each feature map.
internal_attrs = internal_infl_computer.attributions(
beagle_bike_input).sum(axis=(1,2))
Note that above we used the Slice
, MaxClassQoI
, and PointDoI
classes to define the slice, QoI, and DoI. The TruLens API also offers several simple shorthands for defining these parameters more simply. For example, the above code could be more succinctly written as
from trulens.nn.attribution import InternalInfluence# Create the attribution measure.
internal_infl_computer = InternalInfluence(
model, 'block4_conv3', 'max', 'point')# Get the attributions for the internal neurons at the
# 'block4_conv3' layer. Because 'block4_conv3' contains 2D feature
# maps, we take the sum over the width and height of the feature
# maps to obtain a single attribution for each feature map.
internal_attrs = internal_influence_computer.attributions(
beagle_bike_input).sum(axis=(1,2))
Now we can calculate the most important feature map towards the model’s top prediction, by taking the argmax over the internal attributions for this record. The most important feature map represents some type of learned feature that was the most important in the network’s decision to label this point as 'beagle'
.
top_feature_map = int(internal_attrs[0].argmax())
Visualizing Important Internal Neurons
We would now like to visualize our identified feature map in a meaningful way. Since the feature map represents a learned feature, which is not readily interpretable, we will use a second set of attributions to identify the input features that are most important in defining this particular feature map. We will then use a visualizer to visualize these input features that relate to our identified important feature map. The code for this procedure could look like the following:
from trulens.nn.attribution import InternalInfluence
from trulens.visualizations import MaskVisualizerinput_infl_computer = InternalInfluence(
model, (0, 'block4_conv3'), top_feature_map, 'point')# The above is shorthand for:
# infl_input = InternalInfluence(
# model,
# (InputCut(), Cut('block4_conv3')),
# InternalChannelQoI(top_feature_map),
# PointDoi())input_attributions = input_infl_computer.attributions(
beagle_bike_input)# Visualize the attributions as a mask on the original image.
visualizer = MaskVisualizer(blur=10, threshold=0.95)
visualization = visualizer(input_attributions, beagle_bike_input)
The above procedure — using a second set of attributions to identify the input features that are most important in defining a particular feature map, and then using a visualizer on the resulting input attributions — is a common use case when dealing with internal attributions. This procedure can instead be done via a single step, using a ChannelMaskVisualizer
also found in the trulens.visualizations
module, demonstrated below.
from trulens.visualizations import ChannelMaskVisualizervisualizer = ChannelMaskVisualizer(
model,
'block4_conv3',
top_feature_map,
blur=10,
threshold=0.95)
visualization = visualizer(beagle_bike_input)
Other Quantities of Interest
We can also change the quantity that we want the attributions to explain. For example, our example image contains both a bike and a dog. Recall that while the top class predicted by our model was 'beagle'
, ImageNet also contains bike-related classes, e.g., 'mountain bike, all-terrain bike, off-roader'
(class 249). We will use the ClassQoI
to view the attributions towards the class 'mountain bike, all-terrain bike, off-roader'
.
from trulens.nn.attribution import InternalInfluence
from trulens.visualizations import ChannelMaskVisualizerBIKE_CLASS = 249# Create the attribution measure.
internal_infl_computer = InternalInfluence(
model, 'block4_conv3', BIKE_CLASS, 'point')# The above is shorthand for
#
# infl_bike = InternalInfluence(
# model,
# Slice(Cut('block4_conv3'), OutputCut()),
# ClassQoI(BIKE_CLASS),
# 'point')# Get the attributions for each feature map.
internal_attrs = internal_infl_computer.attributions(
beagle_bike_input).sum(axis=(1,2))# Find the index of the top feature map.
top_feature_map_bike = int(internal_attrs[0].argmax())# Visualize the top feature map in the input space.
visualizer = ChannelMaskVisualizer(
model,
'block4_conv3',
top_feature_map_bike,
blur=10,
threshold=0.95)
visualization = visualizer(beagle_bike_input)