Does your NLP model able to prevent adversarial attack?

Adversarial Attack

Edward Ma
HackerNoon.com
Published in
3 min readJun 30, 2019

--

Adversarial attack is way to fool models through abnormal input. Szegedy et al. (2013) introduces it on computer vision field. Given a set of normal pictures, superior image classification model can classify it correctly. However, same model is no longer classify input with noise (not random noise).

Left: Original input. Middle: Difference between left and right. Right: Adversarial input. Image classification model classify left 3 inputs correctly but model classify all right inputs as “ostrich”. (Szegedy et al. 2013)

In natural language processing (NLP) field, we can also generate adversarial example to see how your NLP model resistance to adversarial attack. Pruthi et al. use character level error to simulate adversarial attack. Performance of state-of-the-art model achieve 32% relative (and 3.3% absolute) error reduction.

Architecture

Pruthi et al. use semi-character based RNN (ScRNN) architecture to build a word recognition model. A sequence of words will feed into the RNN model. It does not consume whole word but splitting to prefix, body and suffix.

  • Prefix: First character
  • Body: Second character to second last character
  • Suffix: Last character

--

--

Edward Ma
HackerNoon.com

Focus in Natural Language Processing, Data Science Platform Architecture. https://makcedward.github.io/