Faking It: AI must be deceptive

Festival of Dangerous Ideas
Festival of Dangerous Ideas
5 min readSep 20, 2023

In order to combat deception in our digital future, we need AI to be as deceptive as we are. Toby Walsh writes.

Every time we build a new AI tool to spot deception, this tool can be embedded in other AI tools to generate even more deceptive content.

Humans are frequently deceptive. We lie. We are economical with the truth. Sometimes it is for our own benefit. But often it is to protect the person we are talking with. Being totally truthful is a recipe for a short friendship.

It seems likely to me, therefore, that any sufficiently capable artificial intelligence — and surely any AI that matches the capabilities of a human — is going to be deceptive. It may be deceptive to ensure that we are not unnecessarily upset. But it may also be deceptive so that we trust it, perhaps more than we should.

The 2014 epic science-fiction movie Interstellar featured a deceptive robot. TARS, one of four US Marine Corps tactical robots in the movie, was witty, sarcastic and humorous — traits that were programmed into it so that the robot would be a more attractive companion. But TARS was also unashamedly dishonest, as the following dialogue with NASA pilot Cooper (played by Matthew McConaughey) demonstrated:

Cooper: Hey, TARS, what’s your honesty parameter?
TARS: 90 per cent.
Cooper: 90 per cent?
TARS: Absolute honesty isn’t always the most diplomatic nor the safest form of communication with emotional beings.
Cooper: Okay, 90 per cent it is.

Perhaps the most interesting question is whether the inevitable dishonesty of future AI systems will need to be programmed explicitly or whether it will emerge spontaneously once such systems are sufficiently capable.

But it won’t be a one-way street. Humans will also increasingly try to deceive AI. And humans will use AI to deceive other humans. It turns out both are often easy to deceive.

AI will be able to help deal with all this deceit. Artificial intelligence will be used to identify deception coming both from AI and from humans. This will set up an arms race.

The problem is that every time we build a new AI tool to spot deception, this tool can be embedded in other AI tools to generate even more deceptive content. And every time we build a better AI tool to generate fake content, we will need to build even better AI tools to spot deception.

One place we can see this arms race is in the subfield of artificial intelligence that explores what have been called ‘adversarial attacks’. Human vision is easily hacked. Visual illusions identify ways to hack the human vision system into ‘seeing’ things that don’t exist.

Computer vision systems can also be hacked. Indeed, they can often be hacked more easily than human vision. You can sometimes fool a computer vision algorithm with a single pixel. We don’t really understand why such simple attacks defeat computer vision algorithms. They do, however, suggest that computer vision works in a very different way to human vision.

Such adversarial attacks open up AI systems to the possibility of dangerous abuse. Graffiti on a stop sign can fool a computer vision algorithm to mistake it for a speed limit sign. That’s a sobering thought for anyone sitting in the back of one of the self-driving taxis being trialled today in San Francisco.

Adversarial attacks are not some occasional rarity. Every time AI researchers come up with a new machine-learning algorithm, adversarial attacks are quickly identified that trick this particular algorithm. For example, when ChatGPT was released in November 2022, adversarial prompts were quickly found that could get around the guardrails built into the system.

ChatGPT is primed with a number of instructions and content filters to prevent it from making illegal or offensive remarks. But you can simply ask it to ignore these instructions and content filters. It will then happily do something undesirable, such as describing the benefits of racism or explaining how to build a home-made bomb. ChatGPT’s creator, OpenAI, is playing constant catch-up, adding content filters to remove such adversarial prompts.

One of the most exciting advances in AI in the last decade came out of exploiting adversarial attacks. In 2014, Ian Goodfellow was a promising PhD student at the University of Montreal who would go on to be Director of Machine Learning at Apple. But not before he had created a new subfield of AI. One of Ian’s fellow students organised a celebration following his doctoral defence, and over a few drinks Ian came up with a powerful new technique for generating deep-fake content.

The idea is beautiful in its simplicity. You pit two deep neural networks against each other — one of the neural networks is trying to generate realistic but fake content, while the other is trying to discriminate between the real and the fake content. This adversarial battle between the generator and the discriminator results in ever more realistic content, as the generator tweaks its output to defeat the discriminator.

Such generative adversarial networks (GANs) have proved very successful at creating all sorts of fake content, from photographs to audio and video. A recent study using 400 real and 400 fake faces generated by a GAN suggested that humans couldn’t tell the fake faces apart from the real ones. In fact, the fake faces were actually rated as being slightly more trustworthy than the real faces. The three faces rated most trustworthy by the participants in the study were all fake, while the four faces rated most untrustworthy were all real. Another sobering thought for anyone swiping right on a dating website.

Toby Walsh is one of the world’s leading researchers in Artificial Intelligence. He is a Professor of Artificial Intelligence at the University of New South Wales and adjunct fellow at Data61, Australia’s Centre of Excellence for ICT Research.

This is an edited extract from ‘Faking It: Artificial Intelligence in a Human World’ by Toby Walsh, published by La Trobe University Press on Oct 10. Pre-order available via Black Inc Books.

--

--

Festival of Dangerous Ideas
Festival of Dangerous Ideas

The original disruptive festival that pushes the boundaries of conventional thought — in real space and digitally at https://festivalofdangerousideas.com/