Ian Goodfellow on pushing machine-learning to the limits

by Chloé Braithwaite

Machine-learning has come far in recent years, thanks in part to the leaps and strides advances in deep neural networks have taken.

And of course, if you know anything about deep machine-learning, you’ve heard of Dr Ian Goodfellow, inventor of generative adversarial networks, or GANs.

“Now that machine-learning works really well, we’re trying make it very robust and reliable, and able to resist anything that people can do to try to interfere with its operation,” he says.

One way researchers are tackling this problem is by using adversarial examples.

“An adversarial example is an example that has been intentionally modified to cause the machine-learning system to make a mistake.”

But why the sudden interest in studying neural networks from this perspective?

“Well the reason is that it has only been about four years now since neural networks became really good at solving lots of different tasks. Now that we’ve managed to get machine learning to perform about as well as humans do on naturally occurring data, now we can start studying the problem of how we make sure neural networks work as well as humans do, even when someone is trying to trick them, even when someone is trying to force the neural net to make a mistake.”

There are very real applications to understanding how machines can be reliably tricked, like in the finance industry.

“They say that when you actually have an algorithmic trading procedure, that you need to be very careful that other people can’t infer the way that your trading algorithm works, otherwise they can trick you into making trades that lose money for you and make money for them.”

There are also real-life applications within the realm of cyber-security, such as manipulating PDFs to be malicious, but that are recognized as safe; or modifying the manifest of Android apps to fool a neural net trained to recognize malware.

The vulnerability being exploited here comes from our neural nets being too linear.

“A lot of you are probably thinking, wait! Neural nets are meant to be extremely non-linear! First off, neural nets are very non-linear if you look at them as a function of their parameters.

“All the different weights in the neural net get multiplied together at different layers, and that means that if you look at the function that the neural net represents, in terms of how a single parameter influences that function, you do get a very non-linear mapping. But if you look at the way inputs are propagated to outputs, we actually see something that looks like a piecewise linear function, you know, rectified linear units and max out units are literally piecewise linear.”

Basically, as each layer generates a change, and it moves further in one direction, it will extrapolate linearly.

“While as human beings, we know it’s not reasonable to extrapolate linearly forever, the machine hasn’t figured that out yet.”

And that’s where a lot of adversarial examples come from.

“For example, if you change every pixel in a single image just a tiny amount, you actually move really far in input space, just because there are so many different pixels.”

So if these linear models are the problem, what about non-linear models?

“It turns out that extremely quadratic models are not nearly as vulnerable as adversarial examples as linear models are. The good news is there are some models that are less vulnerable, the bad news is that the quadratic models are very shallow so they doesn’t perform very well at recognizing normal data.”

Trying to make the quadratic model deep in order to improve its accuracy makes it hard to train.

“So it seems like one way we might be able to overcome adversarial examples is to design a much better training algorithm that can train more highly non-linear algorithms.”

Researchers have also, in a terrifying twist, found that adversarial examples can be used to fool many different models, not just a particular network.

“That means you can train your own model and then make adversarial examples for it, and send them to fool a different model.”

Goodfellow’s colleague, Nicolas Papernot, also developed a way of sending inputs to a remotely hosted model in order to observe the output, and “using the input-output pairs to reverse engineer the model, figure out how to fool it.”

With all this work going into learning to fool these neural networks, what have we learned about defending against adversarial examples?

“There’s lot of different defenses that people have tried, like just regularization things, like weight decay, generative pre-training, adding lots of noise at test time. There’s also more unusual things that I’ve tried doing like error correcting codes, where I have several different pieces of the model outputted code and the idea is that all of the different pieces of the code have to agree with each other… So far, none of these different defenses have worked all that well. The best thing we can do is to actually train on adversarial examples themselves.”

Note: This article is based on a presentation Ian Goodfellow gave at AI With The Best 2017. This was his third time talking for the online developer conference.