TruLens
Published in

TruLens

Peer deep into neural networks with TruLens

Photo by Ryoga Otake on Unsplash

What is TruLens neural network explainability?

TruLens is a flexible, extensible, easy-to-use library for input and internal explanations for deep neural networks. TruLens explainability is powerful because it:

  • works uniformly across popular model development frameworks: Keras, Pytorch, Tensorflow;
  • natively supports internal explanations that surface important concepts learnt by network units, such as visual concepts from images;
  • supports widely used methods, such as saliency maps, integrated gradients, and smooth gradients for input level explanations as instances of a general explainability framework; and
  • serves as a building block for evaluating and improving a rich set of AI Quality/Trustworthy AI attributes, including bias, privacy, error analysis, and more.

For a tutorial on TruLens with accompanying notebooks, please check out the TruLens tutorial blog post by Klas Leino.

Why care about neural network explainability?

Deep neural network machine learning models are widely used for tasks involving image, text, and time series data. These models hold great promise, since they can address a wide array of critical business tasks, such as:

  • computer vision problems in medical diagnostics, such as flagging medical scans of interest
  • computer vision challenges in manufacturing, such as sorting, picking, or routing
  • natural language processing tasks to support investing and security analysis, or to identify fraudulent financial activity
  • natural language processing tasks that facilitate a customer experience, and
  • fraud detection and security analytics.

However, coupled with this success is a growing concern about the opacity of deep networks. How exactly are models accomplishing their tasks? This is the explainability problem, shining a light into the black box of a machine learning model’s function. And how well, and under what conditions,do these models work in real world environments, using live data?. This is known as the generalization problem.

Is it cancer or not? How does the model know?

For example, a model tasked with predicting skin cancer from radiology images is of limited use if radiologists are unable to understand which visual concepts caused the model to make certain predictions. Providing this form of explainability is critical during model development, to ensure that the model has learned appropriate visual concepts for this task from the training data, so that it will be effective when it is deployed to live use, with live data. Explainability helps to demonstrate that the model was indeed evaluating the appropriate data and not making correlations based on irrelevant information, which would cause the model to fail in real world use. For example, was the conclusion made due to aspects of the tissue of the biopsy itself, or the presence of the ruler in the diagnosis image?

Explainability is also important to effectively support the use of the model in decision support systems involving humans and to address concerns about fairness and other societal values. The radiologist wants to know that they can trust the model results, so that they can act swiftly and correctly, and the patient might want to be reassured that your model doesn’t discriminate against their gender or ethnic group, hampering their treatment and recovery.

Model layers — improving accuracy, but increasing the explainability challenge

A key challenge in understanding a deep model’s “rationale” for making a given prediction lies in its typical size and complexity. The very reason that these models are often able to outperform other technologies is because they are composed of many layers, each of which is able to learn a useful representation of the underlying data without explicit instruction or supervision by domain experts. However, this means that in order to understand why a model behaves as it does, one needs to be able to probe these layers to learn which concepts they represent, and how they are used within the network to reach a final prediction.

TruLens — examining the layers to explain and improve neural network models

We are pleased to announce the release of TruLens, an open-source library that provides a powerful suite of algorithmic tools for probing the multiple layers of neural network models. TruLens is a flexible, extensible, easy-to-use library for input and internal explanations for deep neural networks. The advantages of TruLens are that it:

  • works uniformly across popular model development frameworks: Keras, Pytorch, Tensorflow;
  • natively supports internal explanations that surface important concepts learnt by network units, such as visual concepts from images;
  • supports widely used methods, such as saliency maps, integrated gradients, and smooth gradients for input level explanations as instances of a general explainability framework; and
  • serves as a building block for evaluating and improving a rich set of AI Quality/Trustworthy AI attributes, including bias, privacy, error analysis and more.

In this blog post, we introduce the key explainability concepts that are embodied in TruLens and describe a couple of ways in which these methods have been leveraged to address AI Quality problems related to bias and privacy.

Please check out the companion blog post by Klas Leino that provides a quick tutorial on how to use TruLens neural network explainability with accompanying notebooks.

Neural Network Explainability Basics: Feature Attribution

Why is this dog identified as a beagle?

The simplest type of question that one might ask to understand why a model predicts a certain way for an example is, which input features in the example are most salient to the network’s predicted label? This is known as feature attribution, and the explanations that it produces are sometimes called saliency maps. Saliency maps are a quick, popular way of explaining network behavior, and they are offered by several other libraries. Distinctively, TruLens is the only library that works uniformly for models built in Keras, PyTorch and Tensorflow.

An example of a saliency map produced by TruLens is shown below on the right. This was taken from an ImageNet model, which predicted “beagle” as the most likely label for the image on the left. Notice that the saliency map scales the brightness of each pixel, or input feature, in the example according to its importance for the model’s prediction. In this case, it appears that the model focused on pixels that, at least intuitively, make sense for such a prediction. Likewise, those pixels depicting the surrounding context are entirely occluded.

Beagle sitting, with leach connecting him to a bicycle at rest.
Original image
Saliency map showing the key features of a beagle image with highlighted pixels.
Initial saliency map

Figure 1: Saliency map example from TruLens, identifying which pixels help identify a beagle

There are multiple ways to render a visual explanation from a set of feature attributions. In this example, even the most relevant pixels are substantially dimmer than the original, and there is apparent variation between the importance of nearby pixels that could obscure the big picture in some cases. These particular details of the attributions may not be important for many uses of the explanation, and the information might be more clearly conveyed by a visualization like either of the two below.

Saliency map of a beagle image, showing key areas of interest to the computer vision model.
Saliency map of a beagle image, showing heat map of key areas of interest to the computer vision model.

Figure 2: Saliency map visualization alternatives, with higher insights

TruLens provides a visualization module to simplify rendering a set of attributions. Additionally, TruLens provides several different methods for computing feature attributions, including the most popular ones from the research literature. In fact, TruLens is designed to be flexible and composable, so it is possible to implement new, tailor-made methods with minimal effort. This makes TruLens a “future proof” option for neural network explainability. You can read more about that in Klas Leino’s post on getting started with TruLens.

Going Deeper: Internal Attribution

Classifying a type of car

Saliency maps are often a useful tool for gaining a first glimpse into a network’s behavior. They are, however, limited in the type of information that they can provide, as they only relate the input features to the final prediction. Let’s consider a model that is to classify images of cars, identifying “sports cars,” “convertibles,” “SUVs,” etc. It could be used by a car sales website or an insurance company. You are trying to determine how the model is making its classifications, and if the model is of high quality. TruLens allows you to view explanations at different levels of detail as to how the model is making its predictions, using both input level and internal explanations. This will demonstrate if the model will be effective in the real world or if it will run into challenges.

The input level saliency map (Figure 3) might not convey useful insights about why the model labeled the example “sports car”, specifically, which visual concepts the model used to make this prediction.

Saliency map of a red car image, showing areas of interest to the computer vision model. This saliency map is not high relevancy, though.

Figure 3: Input level explanations. Saliency map with visualization of important input pixels does not tell us what concepts the network has learned internally, making it harder to determine whether it will generalize.

TruLens allows you to go deeper, providing internal explanation saliency maps.If we instead look to find the most relevant convolutional feature map for the model’s prediction, and then visualize the pixels used by that feature map, then a more sensible picture starts to emerge.

Saliency map of a red car image, showing key areas of interest to the computer vision model.

Figure 4: Internal explanations. TruLens internal explanations surface important concepts learned by the model; in this example, it surfaces the focus on the wheel

Because we focused on a single feature map, this visualization does not provide a complete account of the model’s prediction. Indeed, the particulars of the wheel may partially explain the label “sports car”, but there are likely other aspects of the instance that contribute as well. TruLens can enumerate the feature maps in decreasing order of their importance to get a more complete understanding.

As shown in the visualization below, the next most important feature map picks up on the bright red hood of the car, so perhaps the model learned a correlation between this color, which is common on sports cars, and the appropriate label.

Saliency map of a red car image, showing “internal feature” key areas of interest to the computer vision model.

Figure 5: TruLens internal explanation focusing on the bright red hood

Classifying a car against other types of cars

Finally, all of the examples that we’ve seen follow from questions about the model’s actual prediction on an instance. It might be useful to understand why a model made its prediction relative to some other label that it could have predicted instead. For example, why did the model predict “sports car” rather than “convertible”? TruLens allows users to pose such queries, or even to define their own by providing an appropriate function. The figure below shows the results of the convertible-vs-sportscar query on a range of instances.

Grid of sports car images on the left, and then a saliency map showing the grid of key areas of identification by the computer vision model.

Figure 6: Understanding why a model classified images as “sports car” instead of “convertible”

Using TruLens to assess and improve AI Quality

These examples scratch the surface of what is possible with TruLens. The library can be used as a building block to assess and improve various AI Quality attributes, such as model performance, privacy, and fairness.

TruLens for AI Quality: Measuring Privacy Vulnerability

So far, we’ve seen how TruLens’s attributions can produce visualizations of key model behaviors, which can help developers or model consumers understand what led to an outcome. Attributions are also useful for a number of automated model analysis tasks, where a human-in-the-loop isn’t necessarily needed to extract helpful insights.

One recent example that appeared in the research literature details the use of TruLens-style internal attributions for identifying vulnerability to a particular type of privacy attack, called Membership Inference. A membership inference attack identifies whether a specific individual was in the training data set. This can be problematic from a privacy standpoint in certain situations, e.g. when the training data only includes individuals with a serious health condition.

To see how this vulnerability arises, we’ll start by looking at a few saliency maps taken from a facial recognition model. Notice that one of them is not quite like the others in its depiction of the important features.

Saliency map of a Tony Blair image, showing key areas of interest to the computer vision model, focusing on central facial elements.
Tony Blair 1
Saliency map of a Tony Blair image, showing key areas of interest to the computer vision model, focusing on central facial elements.
Tony Blair 2
Saliency map of a Tony Blair image, showing key areas of interest to the computer vision model. This area of interest is not correct, however and points to a background color, not the person.
Tony Blair 3

Figure 7: Three saliency maps, one of which is focused on an irrelevant visual concept (hint: think pink).

Identifying model mistakes that can be exploited

Why does the example on the right highlight the meaningless pink background, while those on the left correctly point to Tony Blair’s distinctive facial features? The answer has to do with the features learned by the model to predict instances of Tony Blair, and is revealed by the fact that the two images on the left are from the model’s validation data, whereas the one on the right was a training instance. Thus, it seems that the attributions are telling us that the model has learned to associate this particular background color with the label — it “memorized” this detail of the training data.

Spurious correlations are not uncommon in machine learning, and this may seem innocuous enough (at least, until someone other than Tony Blair appears before a pink background). However, it turns out that this type of mistake can reveal whether or not a given instance was present in the model’s training data. If instead of Tony Blair, this were an image of an individual from a training data set that only includes folks with a criminal history, then the membership inference attack reveals that she had committed a crime. This can be problematic from a privacy standpoint, especially if the crime was from a long time ago and is no longer a relevant piece of information.

By reviewing internal explanations in TruLens, a developer can easily identify a susceptibility to this kind of attack.

Have a look at the conference paper [1] to see in more detail how internal feature attributions are used to measure a model’s vulnerability to these attacks.

TruLens for AI Quality: Explaining Biased Predictions

Who is cooking in the kitchen?

TruLens-style internal explanations can be used to understand the root causes of a form of bias, which allows you to take action to mitigate it (see the ICLR 2019 conference paper [2]).

A model exhibits bias amplification if the distribution of the model’s predictions is more skewed than the prior class distribution in the data. An example of this phenomenon from a paper by Zhao et al. is shown in Figure 8 and Table 1.

In the training data, 33% of the images of kitchens have men in them, whereas 67% of the kitchen images have women. This skew in the prior class distribution is significantly amplified during model training. After model training, only 16% of the kitchen images are predicted to have men in them, whereas 84% are predicted to have women in them.

Series of images of individuals in their kitchens, with the features that computer vision model highlighted in order to predict the gender of the cook.

Figure 8: Training data skewed by gender of the cook. Cooking scene #3 is misidentified to have a woman in it.

Table with data indicating that an AI computer vision model is skewing the predictions of the gender of a cook in a kitchen image.

Table 1: Gender bias gets amplified in predictions made during model training, when the predictions are more likely than the training data to classify a person as a woman.

A root cause of bias amplification is that the deep neural network internally learns weak features. A weak feature has high variance in its correlation with the classification target. For example, Figure 9a shows how the model learns kitchen-related visual concepts (e.g. images of food, knives etc.) and relies on them to predict the presence of women in kitchen photos. The reliance on these features can greatly increase the bias in the training data (Figure 9b). Internal explanations, such as the ones available through TruLens, can be used to surface and visualize these features, and the model can oftentimes be improved by pruning away these features (Figure 10b shows this process).

Illustration showing how a computer model predicting gender of a cook in a kitchen image can be led to make incorrect correlations.

Figure 9a, b: Kitchen-related items, such as bowls or tables, start incorrectly influencing the gender prediction for the cook. TruLens helps to catch these spurious connections.

Illustration showing how a computer model predicting gender of a cook in a kitchen image can be improved by TruLens to make correct correlations.

Figure 10a, b: TruLens can be used to identify spurious connections, pruning away these features to improve model efficacy and increasing the accuracy of predictions.

Compatibility with popular frameworks

We built TruLens with the goal of providing a consistent, well-documented, and flexible set of primitives across popular backends for deep learning. At present, TruLens is compatible with models hosted in TensorFlow, Keras and Pytorch, but if demand for other backends develops, it should be possible to incorporate them without changing the core API. We also welcome contributions from the community, and hope to incorporate new features and capabilities from users!

TruLens accurately depicts real model behavior

A central tenet of our approach is that a good explanation is one that faithfully captures what the model actually does — not one that lines up with preconceived ideas of how the model ought to behave. Put differently, a beautiful visualization that makes intuitive sense to a user, but does not reflect the model’s true behavior, is almost never a useful resource. The foundations of TruLens are rooted in this principle, and backed by mathematical axioms to ensure that its attributions portray the model’s behavior faithfully. For more on this, have a look at the conference paper that introduced our approach to internal attribution [3].

Thank you for your interest in TruLens. You can start using TruLens today by downloading the software at TruLens.org

Authors:

Anupam Datta, Professor, Carnegie Mellon University.
Cofounder, President, and Chief Scientist, TruEra

Matt Fredrikson, Assistant Professor, Computer Science, Carnegie Mellon University

References:

[1] Klas Leino, Matt Fredrikson. Stolen Memories: Leveraging Model Memorization for Calibrated White-Box Membership Inference. Proceedings of the 2020 USENIX Security Symposium.

[2] Klas Leino, Emily Black, Matt Fredrikson, Shayak Sen, Anupam Datta. Feature-Wise Bias Amplification. In Proceedings of the 2019 International Conference on Learning Representations.

[3] Klas Leino, Linyi Li, Shayak Sen, Anupam Datta, Matt Fredrikson. Influence-Directed Explanations for Deep Convolutional Networks. Proceedings of the 2018 IEEE International Test Conference.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Anupam Datta

Anupam Datta

10 Followers

Passionate about enabling effective and responsible adoption of AI. Co-Founder, Chief Scientist, TruEra; Professor, CMU; PhD CS Stanford