Slack Chat: Model interpretability and liability

yevgeni Debaters, gather up! Time to start the second in our slack chat series on the ethics of AI. Last time we discussed the topic in a broad sense. This time we’ll focus on the more specific topic of algorithmic interpretability.

tyler Reporting for duty!

yevgeni Right on time Ty! Awesome!

kathryn Roger!

yevgeni Shall we jump right in?

kathryn Absolutely!

(a few minutes pass…)

kathryn (We have a soldier down) Maybe our neural network got hacked and this is Shadow Tyler! Have you looked into Ian Goodfellow’s work on AI security issues, where a hacker could come in and generate a fake data set that skews end results?

yevgeni OMG that would be a totally ironic representation of what this series of debates is about!

kathryn Exactly! I mean, one of the key issues with interpretability is that we expect systems (or people) to behave in a certain manner, and get confused when they don’t.

yevgeni Tyler does love performance and Gricean maxims. Wouldn’t put it past him to be doing this intentionally to illustrate the discomfort created when systems don’t behave like we expect them to — I mean, his Zoom ID shows up as Niels! We never know who the hell it is!

tyler Ack, sorry another Slack channel called. I’m back!

kathryn (And Tyler had an explanation for his behavior…)

tyler And yes I was trying to be uninterpretable. Did it work?

yevgeni You’re a true black box!

tyler One of my favorite professors would do this test in undergrad classes: He would try hard to give a non-sequitur and not have people make it make sense. In one case, someone asked when a quiz was happening and he looked out the window and said, “My, the sky is blue today”

kathryn LOL

tyler His intention was to just say something random but all the students assumed what you and I might: that he was saying “look at the f’ing syllabus.” The fun stuff here is Gricean maxims (good call out yevgeni), and “Be Relevant” is one of the top ones.

yevgeni So you’re saying that he was interpretable, but the exact interpretation was user specific?

tyler That’s right, they conformed him to conventions, “People don’t do non sequiturs” and put an interpretation on him. Usually we can count on calculating other people’s intentions. In an AI sense, the question about a black box is “is there an intention?”

kathryn This reminds me a bit of the film Ross Goodwin generated using a recurrent neural network.

The script is like TOTAL NONSENSE. But the combination of the acting and our interpretation made sense of it. We impose meaning where it might not exist (fancy Greek word for this is apophenia, perceiving connections between apparently random things).

(But I want to make sure we give Yev a chance to ask questions!)

tyler I trained a character-based RNN and turned the temperature up high and it invented this phrase: “amplusiatial underward”

yevgeni HAHA I suppose I could, but this is a great side path the we fell through!

tyler “You see? What about your home,” said Erica, and looked at the coat behind him.

(Okay, I’ll stop)

kathryn (interpretable)

yevgeni So, before Shadow Tyler hijacked our convo, I was going to ask what it means for a model to be interpretable?

kathryn Good question, Yev! In my mind, interpretable models are not black box models, which means, we can articulate what input lead to what output. That said, we have to be careful not to jump to the conclusion that, say, a logistic regression is ENTIRELY transparent.

tyler How important is consistency in that, kathryn? That is, let’s imagine a pretty simple logistic regression model, where you can say what features are important and how important they are relative to each other…

kathryn Go on, go on

tyler If it’s a static model, every time I put in xyz, I get out the same thing. But we live in a world of feedback loops. So, is a model less interpretable as I update it? At every point it is able to be interrogated BUT it’s going to be inconsistent from model release to model release.

kathryn A couple of points! Let’s step back and imagine that we don’t have a feedback loop, but just a static model. The machine learning community often considers logistic regressions to be more interpretable than, say, convolutional neural networks, which generalize higher-order features from input data than a regression. So, we do have the ability to see how input XYZ leads to output JKL.

But what we often overlook is the human mind work that goes into selecting those features: we bring with us all sorts of prior knowledge and assumptions. As such, I see the higher order work in regressions occurring in a different place of the data science process than it does in many neural network models today.

Second point, is consistency the right framework for an AI-first world? After all, as my friend Scott Penberthy at Google says, these are systems that get better with use.

tyler Yeah, but if *I* stay the same (or think I do) and a system treats me differently each time I use it, that would likely be confusing. I guess the question is “interpretable to whom?”

yevgeni Can that not be treated simply as a system learning and getting better over time? My answer to the same question would be different today than it was a year ago.

kathryn Ty, can you give a concrete example of where this would be strange in a use case?

tyler I’m thinking of loans — I may seem to qualify (or not) at Time 1, but then the financial institution gets a bunch more data and I look quite different to them.

kathryn The who who whom begs the question — new data about whom? Does the system change its answer because your behavior has changed over the past year, or because they have learned more about other people and now situate you differently in a big statistical pool?

tyler I mean, let’s ground this, right? The most important thing for interpretability is being able to prove that you’re not secretly building a terribly racist/sexist awful model of people.

kathryn I think that’s right. Or that an autonomous system does not go rogue and harm people without our being able to stop it (eh hm, Darpa…).

(but i’d prefer to stay away from DOD use cases)

(or owls like this one…and paper-clip annihilation scenarios…)

yevgeni (Although the topic of autonomous weapons is super interesting and important).

kathryn Ty, are certain use cases where interpretability matters more than others?So, is it more grave to avoid racist/sexist/awful shit for important rights like loans/education or is marketing and consumer products as important?


Sure, I mean I go where Cathy O’Neil goes first: places that significantly change people’s lives: education, housing, work, justice, finance/credit. The higher the stakes, the more it matters.

kathryn It makes sense that interpretability need not be a universal model criterion. Like, it’s more important that a self-driving car do the right thing immediately! And if an uninterpretable neural network can drive quick, accurate results, that seems more important than knowing why it did what it did.

tyler “I want to make new delicious baked goods”, “I want a new poem” feel free to blackbox those.

kathryn The poem one is interesting, as the logical extreme of shaping taste is the fake news disaster.

yevgeni Kathryn, you’re right that the vehicle needs to make *a* decision ASAP. But later on should we be able to go back and investigate its “thought process”?

tyler That’s what you’d do with a human driver who hits people.

kathryn What if we really HAVE to make a tradeoff to support innovation and adoption? We can’t have it both ways, today?

Yev, your question is fundamentally about liability. Whom do we blame if something goes wrong?

yevgeni Definitely!

kathryn We tend to evaluate computers with superhuman standards, wanting to have some sort of proof that a system is perfect and infallible before we bother to adopt it. Most of the time this is because we overestimate how well we, as humans, actually perform. So like, there are tons of car accidents caused by human stupidity and tiredness and distraction, that could be prevented by autonomous vehicles. There will indeed be accidents, but less than there are with us dangerous humans at the wheel.

So, what if we take a statistical approach to liability? Stop looking for causes and stick with correlations. Not answering why, but showing what and that. And bring liability up a level of abstraction, to show that less accidents occur in a self-driving world, and therefore humanity wins! I’m generally a fan of Dan Dennett’s notion of competence without comprehension

We discussed this in the In Context podcast

tyler How does that work?

kathryn Dan talks about systems, like our own brains or evolutionary systems, that carry out millions and millions of little random operations. They appear to be the work of intelligent designers who can say why something occurs, but it’s just the compound effect of tons of random shit.

tyler Right, I mean, there’s that old experiment where I poke your exposed brain and you’ll move your arm and if I ask why, you’ll say there was a fly (even though there was no fly).

yevgeni Kathryn, I’d like to challenge your idea of averaging. It would work if all autonomous vehicles worked with the same software, and the question was “should cars be autonomous?” But what if each car has its own software? How do we evaluate a specific iteration of it?

kathryn So do you think individual software makers should be held to standards of accountability? What would that look like?

yevgeni That is a complicated question. I think that a human has to be held liable. Perhaps this means that the maker, or better yet the trainer of the RL system (for example), simply needs to demonstrate they did everything reasonable to train the system.

tyler People hide behind all kinds of things to avoid culpability. It concerns me if we just absolve people/companies that build things by saying that technology gives them a free pass.

kathryn I’ll play devil’s advocate. Our friend Joanna Bryson recently wrote a paper about the potential havoc caused when we try to assign “legal personhood” to a computer system. So she says, it can’t be the AI system. Yev, you say it should be the developer. I’m certainly NOT saying we should just let companies do whatever they want, but do feel like the interpretability framework imposes deterministic criteria on probabilistic systems

tyler Hm…the stats argument across the entire system of autonomous vehicles reminds me of public health. Here, I make recommendations for oodles of people that may produce some individual bad/neutral outcomes, but I’m counting on having a broader effect that makes it “worth it”.

kathryn Totally — actually a former student of mine is proposing something like this for her masters thesis.

tyler I want to come back to something about interpretation and models. It seems to me that interpretation is not merely backwards looking: That is, we do need to understand that our training data tells us what we’re doing.

kathryn oo I like this…

tyler But models (well, systems) can have other elements added into them to try to *change* the future not just predict it from the past. One thing that tends to happen in the models that need interpretability the most is that they assess risk. And typically assess greater risk to people who are in social positions that are lousy You could say, what we want of our systems is to make things better.

kathryn This is actually a decent preview for our upcoming AI in the 6ix meetup. One of the panelists, Nick Frosst, believes interpretability is meaningful because we want to intervene to change the results that systems give. It’s important we are empowered to change our past to make a better future. And if we don’t have traction, we have a greater likelihood of perpetuating our past. I see this as the core issue that arrives when data science shifts from sociology to products. Feedback loops don’t just showing that X injustice exists, they perpetuate it through action and interaction.

I’m late for a meeting. Have to jet!

tyler Bye, everyone!

yevgeni Thanks guys! See you at AI in the 6ix!