The Ongoing Lessons of Tay

This post began as what I thought would be a short email to my research group, but quickly ballooned into a much longer thought. Given that I’ve only publicly expressed my thoughts on Tay in 140-character snippets, I felt that it might be worth sharing them in longform as well.

That said, this post is primarily directed towards people who work with natural language processing or generation, rather a critique of the issues with Tay written for a more general audience. But for the latter, I highly recommend Microsoft’s Tay is an Example of Bad Design by caroline sinders, which does an excellent job of breaking down Tay’s design flaws and the oversights that led to them.

Anyway! Here we go.


In case anybody’s missed this, Microsoft put a chatbot on Twitter, and this happened:

Needless to say, this did not have to happen. It happened because of the carelessness of the people who chose to allow a machine learning algorithm to learn “in the wild” without considering the possible ramifications.

Anybody who is not a white man (or at least interacts with people who are not white men) who has used Twitter for any significant amount of time could have predicted this outcome. The fact that they did not include safeguards against the bot learning the worst of humans to begin with is completely beyond me. They’ve already shown that their algorithm does a great job at online learning of a conversational model — it only took a few hours for Tay to begin spewing hate speech.

The question of ethical algorithm design and deployment in ML and NLP is near and dear to my heart. I believe that too often, we create models that quantitatively “learn” what we want them to learn — the labels in our training set — without ever taking into account the things we don’t want them to learn. Where do you incorporate racism into your cost function? This question sounds rhetorical, but I’m utterly serious about this.

Academia doesn’t work on this problem because nobody’s offering grants to solve it. Industry isn’t working on this problem because it’s dominated by a privileged class who benefit from the systemic biases inherent in both society and our data. Perhaps Microsoft will now begin thinking about this problem, but only after the horrible (and well-deserved, imo) PR that they’re getting due to Tay.

A long time ago, I observed that there are hundreds of NLP papers on sentiment classification, and less than a dozen on automatically identifying online harassment. This is how the NLP community has chosen to prioritize its goals. I believe we are all complicit in this, and I am embarrassed and ashamed.

To be clear: I know we cannot all immediately drop what we’re working on to tackle these other problems instead. The dearth of papers in, e.g., online harassment detection, is a reflection of the makeup of our community and those who fund it. Making progress in this space will require a massive shift in the kinds of people working on NLP research in the first place, and the field of computer science as a whole already has a huge problem with representation and lack of diversity, which ultimately cascades down into situations where a Twitter chatbot is suddenly generating neo-Nazi propaganda.

But we are all trying to create things that will one day become real tools interacting with real people. And so I urge everyone to always consider their work in the greater context of social justice and humanity. What applications is your algorithm targeting? At what points in those applications might the algorithm go awry? We ask what our models should to able to learn — but it is just as important to remember what they should not.