How to prevent embarrassment in AI

The must-have safety net that’ll save your bacon

Cassie Kozyrkov
Apr 15 · 5 min read

How will you prevent embarrassment in machine learning? The answer is… partially.

Expect the unexpected!

Wise product managers and designers might save your skin by seeing some issues coming a mile off and helping you cook a preventative fix into your production code. Unfortunately, AI systems are complex and your team usually won’t think of everything.

There will be nasty surprises that force you into reactive mode.

Real life is like that too. I’m meticulous when planning my vacations, but I didn’t consider the possibility that I’d miss my train to Rome thanks to a hospital tour sponsored by shellfish poisoning. True story. It taught college-age me never to repeat the words “I’ve thought of everything.”

Speaking of things nobody expects

When the unexpected rears its ugly head, the best we can hope for is infrastructure to that reduces the burden of reacting effectively. Let’s talk about building that infrastructure for AI.

Chatbots from hell

The internet loves chatbots gone wild, so let’s look at an example based on Microsoft’s chatbot Tay.

Imagine that you want to make an AI system that tweets like a human. Naturally, you’ll train it on real human tweets. Now, fast forward to it working perfectly.

Ta-da! You now have a chatbot that has learned to use profanity plus other undesirable behaviors you get when you give anonymity to the eternally-adolescent. (I see you, my trolls, this one’s for you.)

While the result is working as intended, your users are upset because your bot just cursed at them (or worse). Clearly, you didn’t do a very good job of intending. Your mistakes were twofold: the objective you picked (“tweet like a human” as opposed to “tweet like a well-mannered and wise human that my business would be proud to be associated with”) and the data you used (“uncurated examples from free-range humans”).

Obvious in foresight

Seasoned machine learning practitioners can predict this punchline — it’s amateurish to be surprised by it. Common sense compels you to understand that asking a chatbot to tweet like a human is practically begging it to start swearing.

AI systems don’t think for themselves. In fact, they don’t think at all.

This is how humans speak, so expect your system to reflect that unfortunate truth in your dataset — because for whatever miserable reason those spicy words and trollish behaviors are what’s in your data. ML/AI systems don’t think for themselves, no matter how much you try to anthropomorphize them. They just turn the patterns you show them into recipes for creating more of the same.

Analytics is about getting your eyes on your data. It helps you discover issues and plan for them.

When a behavior is undesirable and you’ve anticipated it, you can write code to make it impossible right off the bat. There’s even protection if you lack a priori thoughtfulness: well-run teams deploy analytics on the training data long before launch and they know exactly what’s in the ‘textbooks’ they’re asking their machine student to study from. If the team didn’t see it coming by meditating, they could hardly have missed it when analysts dunked them in examples of the gorgeous lyricism of our species.

Ideally. But what if the leaders didn’t think of it up front and the analysts snoozed past it?

Hindsight back-up plan

Don’t put all your eggs in the foresight basket. Relying on nothing but your ability to anticipate all problems is unwise. A way to stay safe(r) is to use protection: a policy layer.

Don’t put all your eggs in the foresight basket. Use policy layers!

A policy layer is a separate layer of logic that sits on top of the ML/AI system. It’s a must-have AI safety net that checks the output, filters it, and determines what to do with it. For example, your policy might say, “No word in the output is allowed to match our profanity blacklist.”

Potential output that violates the policy then triggers a fallback of your choosing, like “Clip the offending word out of the sentence.” Or if you give a smile, smiling replace it with a better smiling word or something.

Etiquette for AI

If enough readers find this article interesting, I’ll write a follow-up explaining policy layers in more detail. In the meantime, you can intuit some of the key points if you think of policy layers as the AI version of human etiquette.

Policy layers are the AI equivalent to human etiquette.

To understand why they’re a better option than the naïve approach of sheltering your system from undesirable data, ponder how I manage the neat trick of not swearing at you. I happen to be aware of some very pungent words across several languages, but you don’t hear me uttering them on stage. That’s not because they fail to occur to me. It’s because I’m filtering myself. Society has taught me good(ish) manners. Luckily for you (and your users!) there’s an equivalent fix for machine learning… that’s exactly what policy layers are.

Now that you know they exist and they’re easy to build, it would be mighty rude of you not to incorporate them into your AI systems immediately.

Further etiquette wisdom from the internet.

how hackers start their afternoons.

Cassie Kozyrkov

Written by

Head of Decision Intelligence, Google. ❤️ Stats, ML/AI, data, puns, art, theatre, decision science. All views are my own.

how hackers start their afternoons.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade