IBM and MIT Researchers Find a New Way to Prevent Deep Learning Hacks
Deep learning may have revolutionized AI — boosting progress in computer vision and natural language processing and impacting nearly every industry. But even deep learning isn’t immune to hacking.
Specifically, it’s vulnerable to a curious form of hacking dubbed ‘adversarial examples.’ It’s when a hacker very subtly changes an input in a specific way — such as imperceptibly altering the pixels of an image or the words in a sentence — forcing the deep learning system to catastrophically fail.
AI has to be robust to withstand such attacks — and adversarial robustness also extends to its level of defenses against ‘natural’ adversaries, be it white noise, black-outs, image corruption, text typos or unseen data. While computer vision models are advancing rapidly, it’s possible to make them more robust by exposing them to subtly altered images through adversarial training. But this process is computationally expensive and imperfect; there will always be outlier images that may trip the model up.
And this is what recent research described in a paper presented at this year’s NeurIPS conference aims to change.
In the study, a team of neuroscientists from MIT and the MIT-IBM Watson AI Lab investigated how neuroscience and AI can inform one another. They’ve explored whether the human brain can offer clues on how to make deep neural networks (DNNs) even more powerful and secure. Turns out it can.
The paper describes a new biology-inspired model, dubbed VOneNet (for a specific region of the brain called V1), based on learning from the brain that can help address malicious adversarial attacks of AI models.
The research was led by Harvard graduate student Joel Dapello, the head of MIT’s Department of Brain and Cognitive Sciences James DiCarlo, and Tiago Marques, an MIT postdoc. They worked together with MIT graduate student Martin Schrimpf, MIT visiting student Franziska Geiger, and MIT-IBM Watson AI Lab Co-director David Cox — to gain insight from the brain’s truly mysterious ways.
Understanding the brain
By its very nature, deep learning or deep neural networks (DNNs) is loosely based on the functioning of the brain, inspired by the structure of biological nervous systems. Deep neural networks are composed of individual ‘cells’ — neurons — connected to each other by ‘synpases’. “Like in the brain, organizing these elements in a ‘deep’ hierarchy of successive processing stages gives the artificial deep neural networks much of their power,” says IBM researcher David Cox.
However, adversarial attacks highlight a big difference in how deep neural networks and our brains perceive the world. Humans are not fooled at all by the subtle alterations that are able to trick deep neural networks, and our visual systems seem to be substantially more robust. Animal camouflage and optical illusions are probably the closest equivalent to adversarial examples against our brains.
But with a machine, it’s possible to carefully perturb the pixels in the image of a stop sign to trick a deep learning-based computer vision system into misclassifying it as a speed limit sign or anything else the adversary chooses, even though the image looks unchanged to the human eye. It is even possible to create physical objects that will trick AI-based systems, irrespective of the direction the object is viewed from, or how it is lit.
While researchers have made some progress in defending against these kinds of attacks, first discovered in 2013, they are still a serious barrier to a wide deployment of deep learning-based systems. The current approach, called adversarial training, is also extremely computationally expensive. And this is exactly what the new research paper is trying to address.
Learning from biology
The MIT-IBM collaboration has been uncovering useful tricks from neuroscience to infuse into our AI systems for years. Recently, the DiCarlo Lab has developed metrics for comparing data collected from the human brain with artificial neural networks, to understand which systems are closer or further away from biology.
In the latest study, the team explored the adversarial robustness of different models and studied if that was related to how similar they were to the brain. “To our surprise, we have found a strong relationship,” says Cox. “The more adversarially robust a model was, the more closely it seemed to match a particular brain area — V1, the first processing stage of visual information in the cerebral cortex.”
So the team decided to add some well-known elements of V1 processing in the input-stage of a standard DNN. They found out that this addition made any model substantially more robust. On top of that, including this block doesn’t add any more complexity or training cost to the models. It’s much computationally cheaper than the typical adversarial training, and surprisingly effective. It also confers robustness against other kinds of image degradation, like adding noise.
Their brain-inspired model, VOneNet, outperforms the state-of-the-art white-box attacks, where the attacker has access to the model architecture. It also outperforms black-box attacks, where the attacker has no visibility inside. And it does so with little added cost.
While impressive, “there’s certainly more work to be done to ensure models are invulnerable to adversarial attacks,” says Cox. And it’s not just a problem for computer vision. What’s clear, Cox adds, is that this research shows the need to keep learning from neuroscience to further boost adversarial robustness — and vice versa, to understand why something works in an artificial system, and how it can possibly help improve our still limited understanding of the human brain.
This story was first published on IBM Research blog