Interpretable Machine Learning

Machine learning can be really impressive when it works. For example, Google Photos can now filter my entire photo collection down to just those that include a particular loved one, almost without regard for when in their life the photo was taken. Manually labeling photos is so two years ago.

However, it can go horribly wrong. Shortly after its release into the wild, the machine learning algorithm powering that new feature labeled some black people ‘gorillas,’ a longstanding racial epithet. And nearly all the ‘cute babies’ found by Google Image Search are white. Unlike the victims of the famously faulty software within the Therac-25 radiation therapy machine, no one is being physically injured or killed. But when ad-choosing algorithms present women with fewer ads for high-paying jobs than men, people do get hurt.

Mistakes are inevitable. No software is without bugs. I will give the engineers behind Google Photos the benefit of the doubt and assume they tested their algorithms on a very diverse dataset of human faces. However, it is also appropriate to ask if our existing testing and debugging tools are up to the task when working with algorithms, like neural networks, that can be famously opaque.

What do humans need in order to understand a machine learning algorithm’s output? Are there better visualizations and interfaces to present the results? Should the algorithms themselves be changed to increase humans’ ability to understand their results? Do we have to trade off between interpretability and performance? There are so many questions. Let’s get to work.