Did Google Duplex just pass the Turing Test?
This conversational AI changes everything
I think it was the first “Um.” That was the moment when I realized I was hearing something extraordinary: A computer carrying out a completely natural and very human-sounding conversation with a real person. And it wasn’t just a random talk. This conversation had a purpose, a destination: to make an appointment at a hair salon.
The entity making the call and appointment was Google Assistant running Duplex, Google’s still experimental AI voice system and the venue was Google I/O, Google’s yearly developer conference, which this year focused heavily on the latest developments in AI, Machine- and Deep-Learning.
Google CEO Sundar Pichai explained that what we were hearing was a real phone call made to a hair salon that didn’t know it was part of an experiment or that they were talking to a computer. He launched Duplex by asking Google Assistant to book a haircut appointment for Tuesday morning. The AI did the rest.
Duplex made the call and, when someone at the salon picked up, the voice AI started the conversation with:
“Hi, I’m calling to book a woman’s hair cut appointment for a client, um, I’m looking for something on May third?”
When the attendant asked Duplex to give her one second, Duplex responded with:
The conversation continued as the salon representative presented various dates and times and the AI asked about other options. Eventually, the AI and the salon worker agreed on an appointment date and time.
What I heard was so convincing I had trouble discerning who was the salon worker and who (what) was the Duplex AI. It was stunning and somewhat disconcerting. I liken it to the feeling you’d get if a store mannequin suddenly smiled at you.
It was easily the most remarkable human-computer conversation I’d ever heard and the closest thing I’ve seen a voice AI passing the Turing Test, which is the AI threshold suggested by Computer Scientist Alan Turing in the 1950s. Turing posited that by 2000 computers would be able to fool humans into thinking they were conversing with other humans at least 30% of the time.
He was right. In 2014, a chatbot named Eugene Goostman successfully impersonated a wise-ass 14-year old programmer during lengthy text-based chats with unsuspecting humans.
Turing, however hadn’t necessarily considered voice-based systems and, for obvious reasons, talking computers are somewhat less adept at fooling humans. Spend a few minutes conversing with your voice assistant of choice and you’ll soon discover their limitations.
Their speech can be stilted, pronunciations off and response times can be slow (especially if they’re trying to access a cloud-based server) and forget about conversations. Most can handle two consecutive queries at most and they virtually all require a trigger phrase like “Alexa” or “Hey Siri.” (Google is working on removing unnecessary “Okay Googles” in short back and forth convos with the digital assistant).
Google Assistant running Duplex didn’t exhibit any of those short comings. It sounded like a young female assistant carefully scheduling her boss’s haircut. In addition to the natural cadence, Google added speech disfluencies (the verbal ticks, “ums,” “uhs,” and “mm-hmms”) and latency or pauses that naturally occur when people are speaking. The result is a perfectly human voice produced entirely by a computer.
The second call demonstration, where a male-voiced Duplex tried to make restaurant reservations, was even more remarkable. The human call participant didn’t entirely understand Duplex’s verbal requests and then told Duplex that, for the number of people it wanted to bring to the restaurant, they didn’t need a reservation. Duplex handled all this without missing a beat.
“The amazing thing is that the assistant can actually understand the nuances of conversation,” said Pichai during the keynote. That ability comes by way of neural network technology and intensive machine learning,
For as accomplished as Duplex is in making hair appointments and restaurant reservations, it might stumble in deeper or more abstract conversations. In a blog post on Duplex development, Google engineers explained that they constrained Duplex’s training to “closed domains” or well-defined topics (like dinner reservations and hair appointments) This gave them the ability to perform intense exploration of the topics and focus training. Duplex was guided during training within the domain by “experienced operators” who could keep track of mistakes and worked with engineers to improve responses.
In short, this means that while Duplex has your hair and dining-out options covered, it could stumble in movie reservations and negotiations with your cable provider.
Even so, Duplex fooled two humans. I heard no hesitation or confusion. In the hair salon call, there was no indication that the salon worker thought something was amiss. She wanted to help this young woman make an appointment. What will she think when she learns she was duped by Duplex?
Obviously, Duplex’s conversations were also short, each lasting less than a minute, putting them well-short of the Turing Test benchmark. I would’ve enjoyed hearing the conversations devolve as they extended a few minutes or more.
I’m sure Duplex will soon tackle more domains and longer conversations, and it will someday pass the Turing Test.
It’s only a matter of time before Duplex is handling other mundane or difficult calls for us, like calling our parents with our own voices (see Wavenet technology). Eventually, we’ll have our Duplex voices call each other, handling pleasantries and making plans, which Google Assistant can then drop in our Google Calendar.
But that’s the future.
For now, Duplex’s performance stands as a powerful proof of concept for our long-imagined future of conversational AI’s capable of helping, entertaining and engaging with us. It’s the first major step on the path to the AI depicted in the movie Her where Joaquin Phoenix starred as a man who falls in love with his chatty voice assistant played by the disembodied voice of Scarlett Johansson.
So, no, Duplex didn’t pass the Turing test, but I do wonder what Alan Turing would think of it.