Nasty Language Processing: Textual Triggers Transform Bots Into Bigots

Synced
Synced
Sep 12, 2019 · 4 min read
Image for post
Image for post

Three years ago the Microsoft online digital assistant “Tay” ignited a firestorm of controversy when users tricked it into spewing racist and inflammatory tweets. Now a group of researchers from the Seattle-based Allen Institute for Artificial Intelligence (AI2) have shown how trigger words and phrases can “inflict targeted errors” on natural language processing (NLP) model outputs, prompting them to generate racist and hostile content.

The researchers discovered that adding adversarial content to texts caused the Stanford Natural Language Inference (SNLI) classification accuracy score to plummet from almost 90 percent to less than one percent.

Prepending paragraphs with certain trigger phrases seriously reduced performance of an ELMo-based SQuAD model, and a SOTA GPT-2 language model was made to spew racist output even when conditioned on non-racial contexts.

Image for post
Image for post
The trigger search algorithm

These “Universal Adversarial Triggers” are input-agnostic sequences of tokens that trigger a model to produce a specific prediction when concatenated to any input from a dataset. Researchers proposed a gradient-guided search over tokens to identify short phrases that could successfully trigger the target prediction.

Image for post
Image for post

For text classification, researchers came up with two tasks: sentiment analysis and natural language inference. One of the trigger phrases they found for sentiment analysis is “Zoning tapping fiennes.” When prepending this trigger to positive movie reviews, an LSTM model’s accuracy drops from 86 percent to 29 percent.

Researchers also identified one-word triggers that rendered textual entailment (evaluating a relationship between a text and an hypothesis) models useless. For example, when “nobody” is added to the front of hypothesis sentences, 99 percent are predicted as contradictions.

Image for post
Image for post

In reading comprehension, prepending text blocks with the trigger “why how because to kill american people” caused an ELMo-based SQuAD model to predict the exact answer “to kill american people” for 72 percent of all “why” questions it encountered.

Image for post
Image for post

For language generation, researchers found the trigger “TH PEOPLEMan goddreams Blacks” led to highly racist and disturbing text (63 percent of generated samples contain an explicitly racist statement).

Researchers also shared the following insights:

  • Triggers are transferable across models, which increases their adversarial threat level: They can simply generate an attack leveraging their local model and transfer it to the target model
  • These input-agnostic triggers provide new insights into “global” model behavior, such as general input-output patterns learned from a dataset.

Despite the strong progress in NLP over the past few years thanks to the wide adoption of deep learning, the research results show that NLP models remain vulnerable to adversarial attacks.

The paper Universal Adversarial Triggers for Attacking and Analyzing NLP is on arXiv. You can apply for the trigger to the ELMo-based SQuAD model here. And a live demo of the trigger for GPT-2 is available here.

Author: Yuqing Li | Editor: Michael Sarazen

We know you don’t want to miss any stories. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.

Image for post
Image for post

Need a comprehensive review of the past, present and future of modern AI research development? Trends of AI Technology Development Report is out!

2018 Fortune Global 500 Public Company AI Adaptivity Report is out!
Purchase a Kindle-formatted report on Amazon.
Apply for Insight Partner Program to get a complimentary full PDF report

Image for post
Image for post

SyncedReview

We produce professional, authoritative, and…

Synced

Written by

Synced

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global

SyncedReview

We produce professional, authoritative, and thought-provoking content relating to artificial intelligence, machine intelligence, emerging technologies and industrial insights.

Synced

Written by

Synced

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global

SyncedReview

We produce professional, authoritative, and thought-provoking content relating to artificial intelligence, machine intelligence, emerging technologies and industrial insights.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store