Reliability of Neural Classifiers and Adversarial Attacks!

Published in

unpack

3 min readNov 28, 2020

*Adversarial attacks confound machine learning algorithms by manipulating their input data[3]*

Are Neural models reliable? Are they robust enough to be used in productions? These days such questions about the robustness of Neural models are circulating on the web and in news of artificial intelligence.

An unreliable Neural model can bring harm to the society, for example if we take an example of self driving cars, they are trained on some specific data to recognize and get used to traffic rules, but what if the model see a stop sign which has a missing letter or a letter is half removed? Therefore in such case if the machine learning model of the car will not recognize the “stop” sign and will do some mistakes, if it is not robust for such changes in the inputs, which is so dangerous.

*Researchers tricked self-driving systems into identifying a stop sign as a speed limit sign[3]*

Few years back researchers started to evaluate these Neural model to check if they are robust enough to be tested or used in real world. Szegedy et al [1] first found that Neural models are vulnerable to adversarial inputs, which have very small changes in comparison to its original counterpart but can fool the Neural model. This brought the attention of researchers to the security of these Neural models. later on, many researchers started to evaluate Neural classifiers by generating adversarial inputs to fool these models.

Image classifiers were fooled in a way that some pixels of the real images were changed but for human eye , this change was not imperceptible, which reduced the accuracy of models below 10% in most of the cases.

Similarly Text classifers were fooled by generating adversarial inputs, which has some modifications on sentence , word or character level, but fooled the text classifiers succesfully.

This process of generating adversarial inputs to fool the Neural models are called adversarial attacks, which are used to evaluate the robustness of Neural classifiers.

Why does the model which claim more than 98 percent of accuracy for a specific tasks get fooled by such a minor change in the input?

one reason i can tell you here is that, most of the time these neural models are trained on the data which does not covers every kind of expected inputs. It is very possible that in real world the model is feed by such inputs which are a little different than what it was trained on. For example if there is a small spelling mistakes in the input of of text classifier which is okay for human to perceive but the neural models may fail. In such cases , defence strategies should be planned to make these models more robust for such inputs.

One way of defence is Adverasrial Training. It is done by mixing all the generated adverasarial inputs with the training dataset and re-train the model on this new mixed dataset. Reseach has shown that adversarial training can make these neural models robust to some extent.

There is a tip in this article for developers of dataset and Neural classifiers to take the robustness of these neural model in notice while developing a Neural model or creating a new dataset.

There are many good repositories on github, which has all the relevant papers for image based and text based adversarial attacks and defence strategies. I am adding a very useful link of github repository for textual adversarial attack in the references section of this article below.

Hope you found the article interesting. And yes! do not forget to clap, it gives us motivation to keep writing useful articles for you guys out there:)

References:

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. CoRR abs/1312.6199 (2013). http://arxiv.org/abs/1312.6199
https://github.com/thunlp/TAADpapers
https://portswigger.net/daily-swig/adversarial-attacks-against-machine-learning-systems-everything-you-need-to-know

Reliability of Neural Classifiers and Adversarial Attacks!

Written by Ijaz Khan