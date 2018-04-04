I Can See Your Lips Moving—Why What You Hear Is Affected By What You See
From ventriloquists to crime witnesses, the story of how your brain combines visual and auditory signals to make sense of the world
When we see an event, we start to make assumptions. We use the context in which we find ourselves to make sense of what we see, but some researchers have suggested that context can also affect what we hear. It’s a psychological phenomenon that is no stranger to the courts.
Early in the morning of 21st February 2010, Officer Wes Thompson spotted a couple huddled in a porch. The man was yelling at the woman. The woman was crying. Something was clearly amiss, so Officer Thompson went over to find out exactly what was going on.
On closer inspection he saw that the woman, later identified as Angel Vanarman, was bleeding from the mouth. The man, Gerald Sandefur, told the woman to say nothing and explained to the officer that she’d been attacked. Officer Thompson looked around but could see no one else nearby. He asked for more details, and as Sandefur continued his explanation Vanarman, out of Sandefur’s view, pointed at Sandefur and mouthed the words, ‘He hit me.’ Sandefur was promptly handcuffed and arrested.
What seemed to be a standard case of assault took a different turn when it came to court. Officer Thompson’s statement said that Vanarman had mouthed words that he understood to be ‘He hit me.’ But he didn’t actually hear those words.
The Defence argued this was hearsay and wanted the evidence ignored. But the court instructed the jury that it was up to them to evaluate whether Officer Thompson’s lip reading skills were enough to accurately interpret the facial movements of Vanarman. If they weren’t, then Officer Thompson’s evidence could indeed be ignored, something that would greatly assist the Defence’s case.
Sue Thomas is one of the most famous professional lip readers. Deaf from the age of eighteen months, she went on to become the first deaf person to work for the FBI. Like many law enforcement agencies, the FBI use lip readers to interpret speech caught on surveillance cameras. They actively recruit people, often deaf, for their lip reading skills, and Sue Thomas became one of their most successful analysts, leading to a book about her work — Silent Night -and, later, a television drama series entitled Sue Thomas: FBEye.
Susan Thomas is not alone. There are many expert lip readers, and when not working for the FBI or other security services, they might find themselves called on by tabloid newspapers to discover what this celebrity or that politician was saying when they thought they were having a private conversation. As, for instance, the Daily Telegraph did when it asked professional lip readers to work out what Andy Murray’s fiancée, Kim Sears, said during the Australian Open in 2015. In this case five professional lip readers gave five different interpretations, showing that even for experts lip reading is a difficult skill to acquire.
Forensic lip reading is admissible in court in the UK and US, albeit with similar caveats to that made in the case of Sandefur vs State. According to some estimates only 30 to 40% of speech is actually lip-readable.
But it could be that lip reading is about to become extinct, because audio surveillance has recently taken an astounding and dramatic twist. No longer do we need to read the lips of those we spy on. We don’t even have to have them on camera. All we need to do is focus a high-speed camera (at least 2200 frames per second) on some object in the room — a potted plant for instance. The camera reads the vibrations of the leaves of the plant as it reacts to the sound in the room. These vibrations are then cleverly decoded back into audio.
Developed by Abe Davis, a former Phd Student at MIT, this is known as the Visual Microphone. It is an almost science fiction concept — passively recovering sound from video recordings of plant leaves vibrating — but it’s possible, and using relatively simple and accessible technology:
Without using technology, interpreting sounds can be fraught with many difficulties. Consider the following. You are at a party. There is the usual hubbub and noise. And yet you are able instantly to pick out the mention of your name by someone in the room.
This is an example of selective attention. We are attuned to our own name from an early age, and I would imagine that everyone has had the experience of hearing their name called in the street only to find it’s a different person, same name, whose attention is required.
Then there is the phenomenon commonly known as The Cocktail Party Effect, that ability of being able to focus on the one person speaking to us amid the din and clatter of our noisy surroundings, something that audio interfaces like Siri or Alexa would most certainly benefit from. From these two examples you might think that our sense of hearing is pretty spectacular. But it’s really not.
There is some truth in the idea that we hear what we want to hear. I had evidence of this some years ago when working on a television series about psychic phenomena. During the course of the research I met a practitioner of Electronic Voice Phenomenon, or EVP as it is known in the paranormal trade.
The practitioner began her demonstration by taking me to a location believed to be haunted and, with a cassette recorder in hand, she began calling to the spirits, literally saying, ‘Is there anyone there?’ Together we walked around and occasionally the EVP lady would plead again for the spirits to make their presence known — ‘If there is anyone there, would you please communicate with us?’
After thirty minutes or so she switched off the cassette recorder. I hadn’t heard a peep from the spirits and neither had she. But the EVP analysis had yet to begin, because after our ghostly expedition she took the tape home, boosted the volume in some unexplained way, and then began to listen to the hiss and rumble of the tape between her questions, hoping that these noises would contain the spirit’s messages. She sent edited selections of the tape back to me with written interpretations of what the spirits were saying. And sure enough, when you read the text and then listened to the background noise on the tape, they did sound weirdly similar.
The idea that you can record the sound of spirits dates back to the earliest days of recording technology, and it has captured the imaginations of many parapsychologists and sound technicians. Recording innovator Joe Meek, the much-troubled writer of the pop tune Telstar, regularly visited a graveyard with his audio equipment to record the voices of the dead. There are even stories of him having conversations with a cat, presumably inhabited by a spirit, and somewhere there is an archive of Joe Meek’s EVP recordings that is yet to be made public.
For psychologists these interpretations of noise are a trick of the brain, the text, or your own expectation, priming you to make sense of nonsense. The illusion is very strong and while you can find many dodgy examples of EVP online you can hear a particularly good non-paranormal demonstration here at the Franklin Institute Science Museum.
The word ‘mondegreen’ might be new to you but the phenomenon almost certainly won’t — mondegreen is the name applied to misheard lyrics. The term was coined by former editor of Harper’s magazine Sylvia Wright, who misinterpreted the 16th century Scottish song The Bonnie Earl of Moray, so that the lyric ‘she laid him on the green,’ became transformed in her mind into ‘Lady Mondegreen.’
One reason for these misinterpretations is that when we speak a line, one word quickly follows another without a break. If the speech is indistinct in some way — because it is mumbled, sung, spoken in an unfamiliar accent, or the context is unclear — we do the best we can to fit the sounds into a familiar pattern. But we don’t always get it right.
In the case of song lyrics, the wrong interpretation can be very hard to shift. Once lodged in the brain, it is there until someone points out your error in the next karaoke session. Even when you are aware of an audio illusion you can still be fooled by it. Sometimes to humorous effect as shown by this bad lip reading of Donald Trump’s Inauguration.
Ventriloquism is entirely founded on these powerful audio illusions. We see the dummy’s lips move, although rather woodenly, and are quite prepared to believe that the dummy is talking while the very human partner remains silent. The ventriloquist Terri Rogers worked a lot of stag events in raucous club environments. Because of this, the dummy, called Shorty, had an incredibly abrasive character and took great pleasure in insulting members of the audience. Terri, on the other hand, played the demur lady, quietly standing by Shorty’s side.
On one occasion a member of the audience, fed up with the insults and, to be fair, incredibly bad language, climbed onto the stage and punched Shorty. Terri Rogers, who was a fine inventor of magical illusions as well as a ventriloquist died in 1999 but you can still enjoy her mastery of this illusory audio art in this example on Youtube:
These are the kinds of auditory illusions that gave the jury in the case of Sandefur vs State such a dilemma. Could they trust Officer Thompson’s evidence that he saw a woman mouth the words ‘He hit me’ while pointing at her assailant? When even professional lip-readers can reach different conclusions about what someone is saying, could they expect a Police Officer to do any better?
The jury decided that his testimony was worth taking into consideration and duly found Sandefur guilty of battery. But the decision came with an important caveat, and created a precedent that lip reading, whether by experts or everyday folk, is only admissible in court with a warning to the jury about its accuracy.
There has been some interesting research into what might be called ‘earwitnesses.’ Did someone at the scene of a crime say, ‘He’s got your boot,’ or ‘He’s gonna shoot’? That was the question asked by Daniel Wright and Gary Wareham in the Psychology Department of the University of Sussex. Subjects were shown a video of a man following a woman and asked what the man said, with two phrases to choose from. The decision was generally split, but in some cases the earwitnesses claimed to hear a third phrase fused from the two on offer.
Their 2005 research paper cited a famous case from 1952 in which criminals Derek Bentley and Chris Craig were confronted by the police. Craig was carrying a gun and according to the police, Bentley said, ‘Let him have it.’ Craig fired the gun and killed a police officer. Bentley was accused in court of inciting Craig to fire. At the trial the defence put an entirely different meaning on the same four words. ‘Let him have it,’ they said, meant hand the gun over. Bentley, they argued, played no part in the murder. The court decided otherwise and Bentley became the last man to be hanged in Britain. But the argument did not die with him and following a long campaign he was posthumously pardoned in 1998.
Daniel Wright and Gary Wareham had been inspired by a strange phenomenon that beautifully illustrates the bizarre perceptual connection between hearing and sight. This phenomenon was accidentally discovered in the lab of another British researcher, Harry McGurk. When they were preparing videos to show how infants acquire speech, they mistakenly dubbed the sound ‘ba-ba’ over a video of someone saying ‘ga-ga’. When they played it back, the confusion between what they were seeing and hearing created the perception of a third, completely different, sound — ‘da-da’. It seems that the link between what we hear and what we see is formed at the very beginning of our acquisition, and understanding, of language.