How We Might Protect Ourselves From Malicious AI

New research could make deep-learning models much harder to manipulate in harmful ways

MIT Technology Review

Published in

MIT Technology Review

5 min readMay 21, 2019

By Karen Hao

We’ve touched previously on the concept of adversarial examples — the class of tiny changes that, when fed into a deep-learning model, cause it to misbehave. In March, we covered UC Berkeley professor Dawn Song’s talk at our annual EmTech Digital conference about how she used stickers to trick a self-driving car into thinking a stop sign was a 45-mile-per-hour sign, and how she used tailored messages to make a text-based model spit out sensitive information like credit card numbers. In April, we similarly talked about how white hat hackers used stickers to confuse Tesla Autopilot into steering a car into oncoming traffic.

In recent years, as deep-learning systems have grown more and more pervasive in our lives, researchers have demonstrated how adversarial examples can affect everything from simple image classifiers to cancer diagnosis systems, leading to consequences that range from the benign to the life-threatening. Despite their danger, however, adversarial examples are poorly understood. And researchers have worried over how — or even whether — the problem can be solved.

How We Might Protect Ourselves From Malicious AI

New research could make deep-learning models much harder to manipulate in harmful ways

Written by MIT Technology Review