What an old German horse has to do with AI security

Casey Stegman
5 min readApr 6, 2017

--

Clever Hans

(Here’s snippet from a new longform article that Brent Simoneaux and I wrote for Red Hat’s Open Source Stories. It’s a behind-the-scenes look at the researchers and engineers at OpenAI and how they’re revolutionizing machine learning.)

Ian Goodfellow focuses on something people care a great deal about, adversarial training, or — to put it another way — AI security.

“In the past, security has revolved around application-level security, where you try to trick an application into running the wrong instructions,” he explains. “Or network security, where you send messages to a server that can get misinterpreted. Like you send a message to a bank saying, ‘Hey, I’m totally the account owner, let me in,’ and the bank gets fooled into doing it, even though you’re not actually the account owner.”

But with AI, and specifically machine learning, security is a different animal.

“With machine learning security, the computer is running all the right code and knows who all the messages are coming from,” he says. “But the machine learning system can still be fooled into doing the wrong thing.”

Goodfellow equates this with phishing. With standard phishing, the computer isn’t tricked, but the person operating the computer is.

It’s the same for AI. Its code remains uncorrupted. But it is tricked into doing different tasks than it was trained for.

We’ve all heard stories about someone’s grandfather getting a Nigerian-prince-scam-style phishing email, promising untold riches in exchange for sending $1,000 or $2,000. The grandfather, of course, ends up losing the money and gets nothing in return.

Well, it turns out AI is even more vulnerable than someone’s grandfather.

“Machine learning algorithms are really, really gullible, compared to people,” Goodfellow says.

To make things worse, AI has the potential to be more powerful than anyone’s grandfather. This is no knock against your or anyone else’s elder patriarch. It’s just that Gramps falling for the Nigerian Prince scam is not as problematic as, say, a machine learning algorithm used for the financial services sector being tricked into helping hackers defraud a major bank or credit card company.

“If you’re not trying to fool a machine learning algorithm, it does the right thing most of the time,” Goodfellow says. “But if someone who understands how a machine learning algorithm works wanted to try and fool it, that’d be very easy to do.”

Furthermore, it’s very hard for the person building the algorithm to account for the myriad ways it might be fooled. The builder could maybe account for one specific type of attack, like an AI-focused Nigerian Prince scam. But that doesn’t mean that the AI can’t be tricked into another, even more simplistic con.

Goodfellow’s research focuses on using adversarial training on AI agents. This approach is a “brute force solution” in which a ton of examples meant to fool an AI are generated. The agent is given these examples and trained not to fall for them.

For example, you might train the AI used in a self-driving car not to fall for a fake sign telling the AI to halt in the middle of the highway.

Goodfellow has developed (along with Nicholas Papernot) cleverhans, a library for adversarial training.

The name comes from a German horse who became famous in the the early 20th century for his ability to do arithmetic.

A German math teacher (also a self-described mystic and part-time phrenologist) bought the horse and claimed that he had taught it to add, subtract, multiply, divide, and even do fractions. People would come from all over and ask Clever Hans to, for example, divide 15 by 3. The horse would then tap his hoof 5 times. Or people would ask it what number comes after 7. The horse would tap his hoof 8 times.

The problem was, Clever Hans wasn’t that clever — at least not in the way his teacher thought.

A psychologist named Oskar Pfungst discovered that the horse wasn’t actually doing math. Rather, he was taking his cues from the people around him. He’d respond to these people’s body language, tapping his hoof until he got a smile or a nod. Pfungst illustrated this by having the horse wear a set of blinders. When asked a question, the horse began tapping his hoof. But, unable to see the person who’d asked the question, he just kept tapping indefinitely.

“Machine learning is a little like Clever Hans,” Goodfellow says, “in the sense that we’ve given the AI these rewards for, say, correctly labeling images. It knows how to get the rewards, but it may not always be using the correct cues to get to those rewards. And that’s where security researchers come in.”

Goodfellow’s cleverhans library has been open sourced.

“With traditional security, open source is important because, when everybody can see the code, they can inspect it and make sure it’s safe,” he says. “And if there’s a problem, they can report it relatively easily or even send the fix themselves.”

A similar dynamic holds for machine learning security. Generally speaking, that is.

“For machine learning, there isn’t really a fix yet,” Goodfellow says. “But we can at least study the same systems that everybody is using and see what their vulnerabilities are.”

When asked if there’s anything that has surprised him about his experiences doing machine learning research, Goodfellow talks about the time he ran an experiment for a machine learning algorithm to correctly classify adversarial examples.

He had just read a research paper that made some claims he thought were questionable. So he decided to test them. While his experiment was running, Goodfellow decided to step out to grab some lunch with his manager.

“I told him,” Goodfellow recalls, “‘when we get back from lunch, I’m not sure the algorithm’s going to correctly classify these examples. I bet it will be too hard. And, even after this training, it will still misclassify them.’”

But when he came back, Goodfellow found that the algorithm not only recognized the adversarial examples, it had also set a record for accuracy in classifying the normal ones.

“The process of training on the adversarial examples had forced it to get so good at its original task that it was a better model than what we had started with,” Goodfellow says.

At that moment, Goodfellow realized that, for AI, adversarial training wasn’t just important for finding vulnerabilities.

“By thinking about security,” he says, “we could actually make everything better across the board.”

(To read the full article, head over here.)

--

--

Casey Stegman

Writer at Red Hat, working on stories about open source technology, culture, and history.