How Lab Mice Are Helping Detect Deepfakes

PCMag
PC Magazine
Published in
4 min readAug 9, 2019

Systems for detecting fake audio and video are improving, but actually solving the problem may require thinking outside the box. Enter the mice.

By Neil J. Rubenking

“I’ll believe it when I see it with my own eyes!” Once a common phrase, this statement just doesn’t hold water anymore, thanks to deepfake videos that manipulate footage to change people’s appearance and the words coming out of their mouths.

Creating a convincing deepfake takes a lot of time and computing power, as does training computers to distinguish humans from deepfakes. At the Black Hat conference this week, a cross-discipline team of researchers presented some novel ideas on how to manage the problem, looking specifically at the problem of generating voice audio that sounds human.

George Williams, Director of Data Science at the University of Oregon, reminded attendees of the 1938 Orson Welles “War of the Worlds” radio broadcast, a fictional tale of a Martian invasion that many believed to be real. “The reports of panic may have been exaggerated,” said Williams, “but it’s still useful to compare with the events of today, in the era of disinformation and fake news.”

“The big difference,” he continued, “is that you can craft high-quality, realistic content for disinformation. Tools are readily available, and some are open source. A parade of politicians and tech leaders warning us of some disaster-a well-timed fake of a CEO saying something they didn’t-that could spark some kind of catastrophe. It could destabilize a financial market, or ignite a powder keg of civil or military conflict around the globe, a true war of the worlds.”

Williams cited a study that challenged humans and algorithms to distinguish real talk from generated talk. Humans got it right about 88 percent of the time, while the algorithm did better at 92 percent. “That sounds good,” he said, “but think of the millions of content items created daily. Even a small error rate means some fakes get through, and some genuine content gets flagged as fake.”

Techniques for Deepfake Creation

Alex Comerford, a data scientist at Bloomberg, reviewed the history of generated speech, from Microsoft Mike in 1999 to Google’s Tacotron 2, released last year. Each iteration sounded more human than the last.

“Over the phone, I’d be fooled,” Comerford said of the Tacotron sample.

One powerful technique for creating these convincing voices, referred to as a General Adversarial Network, pits two programs against each other. One tries to create a convincing voice, the other tries to distinguish the fake from real voices. Each gets better and better at its task. Another technique called bispectral analysis, borrowed from signal processing science, also proved effective.

“The takeaway is that detection is a cat and mouse game. What works now may not be the long-term solution,” said Comerford.

A Biological Approach

Jonathan Saunders, a graduate student at the University of Oregon, took the discussion in a new direction, drawing on phonetics and neuroscience. “Speech is hard,” he noted. “Phonemes come fast in normal speech. Voices are all different. We have to throw away what’s not informative.”

“Our auditory system is designed to be gullible,” he continued. “It has to collapse redundant, overlapping information. The object is just to understand speech.” But just how do we accomplish that?

He described experiments performed with the help of epilepsy patients who already have electrodes in their brains. “But we still know very little,” said Comerford. “Speech is too fast and neurons too small for a typical fMRI. So we turned to…”

Mice? Really?

Yes, they turned to mice. Researchers have trained the rodents to distinguish between the sounds of similar consonants. The mice first learn with the same sounds every time, then with sounds from different speakers.

“They are quite good at it,” noted Comerford. “They learn generalizable consonant categories. They’re about 75 percent accurate. Novel speakers and novel vowels drop their average, but only about 10 percent.”

More importantly, the mice get it wrong in different ways. “Two different mice compared on two sets of tones will have completely different patterns of errors,” explained Comerford. And unlike with human volunteers, researchers can look at the mouse’s auditory cortex during learning and testing.

Coming back to the original problem, Comerford suggested that determining precisely how mice learn to make consonant distinctions could inform the deepfake detection algorithms. “People are pretty good, but machines are getting better. The real way to solve this problem may lie in combining phonetics with neural networks,” he concluded.

Originally published at https://www.pcmag.com on August 9, 2019.

--

--