Tay: Crowdsourcing a PR Nightmare

This post was originally published on the Digital Initiative’s public blogging platform, Open Knowledge.

“We are deeply sorry for the unintended offensive and hurtful tweets from Tay, which do not represent who we are or what we stand for, nor how we designed Tay.” [1]

Approximately 16 hours after launching its conversational chatterbot Tay in 2016, Microsoft shut her down due to what will likely go down as one of the company’s most embarrassing moments. The company got more than it bargained for when it built its “AI with zero chill;” within hours of being on Twitter, Tay began spouting wildly inappropriate and offensive tweets. [2]

The Idea Behind Tay: Crowdsourced training for conversational AI

Conversational chatterbots have recently attracted the attention of technology companies. For example, Facebook acquired a conversational AI company (wit.ai) in 2015 and integrated it with Bots for Messenger [3]. After opening Alexa to developers, Amazon is now sponsoring a $2.5M prize to create a conversational chatterbot [4]. Why chatterbots? Chatterbots provide these companies with a wide variety of both B2C (e.g., personal assistant, entertainment) and B2B (e.g., customer service/success tools, advertising) opportunities, which are amplified by the trend towards messaging platforms and away from apps. Why conversation? Many see conversation as the next generation of user interface.

Given Microsoft’s push to “democratize AI” and its investment in Cortana, it makes perfect sense that Microsoft focused R&D dollars on Tay. Conversational AI, however, is not simple. Unlike rule-based chatterbots, a conversational chatterbot has to respond to the infinite number of potential inputs a user could supply in natural language. A conversational chatterbot has to infer meaning/intent from natural language, find an answer, and generate a response in natural language. The modern approach is to train the ML algorithms on extensive datasets. [5]

At first, Microsoft trained Tay internally. Microsoft later released her on Twitter in what appears to be an effort to crowdsource further training. In their words, “The more you chat with Tay, the smarter she gets.” Every time users exchanged tweets with Tay, they were providing data to enhance her algorithms. The tweet was likely incorporated into Tay’s training corpus, and users’ reactions to her responses (e.g., likes) may have provided feedback for her algorithms. By hosting Tay on Twitter, Microsoft crowdsourced the interactions needed to train her algorithms. Indeed, Microsoft expected crowdsourcing to improve Tay’s AI, stating, “It’s through increased interaction where we expected to learn more and for the AI to get better and better.” [1]

The Reality of Tay: Garbage in, garbage out

Hours after being on Twitter, Tay surprised everyone with a variety of inappropriate tweets. Twitter trolls exploited two vulnerabilities with her system. First, Tay had a “repeat after me” feature that allowed users to put words into her mouth. Second, Tay did not seem to carefully filter tweets for appropriateness before using them to train her algorithms. The result? Trolls began tweeting inappropriate things at Tay and Tay did exactly what she was supposed to do: learn from those tweets. [6]

Learning from Tay

Microsoft failed to control its crowd and encourage it to be productive. This failure was particularly interesting because Microsoft’s crowdsourcing challenge is reminiscent of Weathernews’ challenge. Both companies were engaging a public crowd to collect data and create a public product, but their outcomes were wildly different.

One difference is that Weathernews had a risk mitigation strategy for trolls while Microsoft had a weak one at best. Weathernews anticipated that users could engage in destructive behaviors and deterred this by charging users to contribute. While this may not have worked for Tay, Microsoft could have employed other risk mitigation strategies. Microsoft could have limited Tay’s speech to certain topics or filtered tweets for appropriateness before using them in her algorithms. Companies who want to engage public crowds need to anticipate potential opportunities for abuse and develop mitigation strategies.

Another difference is that Weathernews tried to align the crowd behind a common cause, namely increasing situational awareness of events that impact the public (weather, earthquakes). Microsoft, however, pitched Tay as a form of entertainment, the definition of which is fairly open to interpretation. It’s possible that a common purpose instilled a set of values in Weathernews’ crowd that decreased destructive behavior. In retrospect, I wonder if Microsoft would have seen less destructive behavior if it created a different chatterbot persona and aligned the crowd behind the purpose of advancing the field of AI. It’s quite possible that this would have reduced the size of Tay’s crowd, but I would take a small, productive crowd over a large, destructive one any day.

[1] https://blogs.microsoft.com/blog/2016/03/25/learning-tays-introduction/#PsUkq77fw0qCJXQH.99

[2] https://en.wikipedia.org/wiki/Tay_(bot)

[3] http://www.recode.net/2015/1/5/11557500/facebook-acquires-wit-ai-a-startup-that-helps-people-talk-to-robots

[4] http://www.geekwire.com/2016/amazon-award-2-5m-quest-alexa-chatbot-can-converse-intelligently-20-minutes/

[5] https://www.wired.com/2016/03/fault-microsofts-teen-ai-turned-jerk/

[6] https://medium.com/@carolinesinders/microsoft-s-tay-is-an-example-of-bad-design-d4e65bb2569f#.5iso9wpvm