Machine Learning Explainability and Robustness: Connected at the Hip
Last week, we presented a tutorial at KDD 2021 on explainability and robustness of deep neural networks and the surprising relationships between these two concepts.
You can find the tutorial slides & outline here. Stay tuned for a video sequence that we will release shortly! I created the tutorial with Matt Fredrikson, Klas Leino, Caleb Lu, Shayak Sen, and Zifan Wang — a team that has been researching these topics for over 5 years.
Tutorial Highlights
- Foundations of XAI: This section provides guidance on which explanation frameworks are appropriate for use and under what conditions. We highlight requirements for “good explanations”, in particular, explanation accuracy (correctly capturing drivers of model behavior), explanation generality (answering a rich set of queries about model behavior), and interpretation devices (e.g. visualization methods that make the explanations meaningful to humans), as well as how various approaches meet these requirements. We include a quick tour of gradient-based explanation methods, including Saliency Maps, Integrated Gradients, and Influence-directed Explanations.
- TruLens Open Source Library: We present TruLens, a new explainability library for deep neural networks. Go play with it! Distinctively, TruLens provides a uniform API to work with models built with PyTorch, Tensorflow and Keras. You can check out the following CoLab notebooks to get started with using TruLens:
Building on Trulens, we also share a demo of Boundary Attributions, an. explanation method that takes into account decision boundaries and is suitable for accurately explaining classification decisions. You can get started with the demo Colab notebook here:
3. Foundations of Adversarial Robustness: We introduce concepts surrounding adversarial robustness, including state-of-the-art adversarial attacks as well as a range of corresponding defenses. If you are working with deep neural networks and would like to make your models robust, you will find this section useful.
4. Connecting Explainability & Adversarial Robustness: We present key insights from the recent literature on the surprising connections between explainability and adversarial robustness: (a) We show that many commonly-perceived issues with explanation methods are actually caused by a lack of robustness of the model; (b) we also show that a careful study of adversarial examples and robustness can lead to models whose explanations better appeal to human intuition and domain knowledge. This section highlights the importance of making models robust to ensure that you get meaningful explanations. As the tutorial title suggests, explainability & robustness are connected at the hip!