Summary: Interpreting Neural Networks With Nearest Neighbors (EMNLP-ws 2018)

Zhengli Zhao
Nov 6, 2018 · 2 min read

Authors: Eric Wallace, Shi Feng, Jordan Boyd-Graber

They discuss a number of limitations for saliency-based interpretations. In particular, a neural network’s confidence can be unreasonably high even when the input is void of any predictive information. Therefore, when removing features with a method like Leave One Out, the change in confidence may not properly reflect whether the “important” input features have been removed. Consequently, interpretation methods that rely on confidence may fail due to issues in the underlying model.

They address this by changing the test-time behavior of neural networks using Deep k-Nearest Neighbors, which provides a more robust uncertainty metric: conformity without harming classification accuracy. They use the conformity metric to generate feature importance values.

They find the resulting interpretations better align with human perception than baseline methods: leave-one-out and gradient-based feature attribution. They also use their interpretation method to analyze model predictions on SNLI dataset annotation artifacts.

References:

https://arxiv.org/abs/1809.02847
https://sites.google.com/view/language-dknn/
https://zerobatchsize.net/2018/09/11/dknn.html
https://github.com/Eric-Wallace/deep-knn

UCI NLP

Zhengli Zhao

Written by

zhengliz.github.io

UCI NLP

UCI NLP

Posts by authors affiliated with the UC Irvine Natural Language Processing group

More From Medium

More from UCI NLP

Also tagged Interpretability

Also tagged Interpretability

Also tagged Interpretability

Explaining Measures of Fairness

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade