The Terminator is (Robo) Calling

Howard "Bart" Freidman
Rule the Robots
Published in
5 min readMay 30, 2018

In the AI arms race, Google’s I/O conference is akin to Russia’s Victory Day parade — an annual, carefully-orchestrated pullback of the curtains hiding otherwise-secret projects, designed to showcase technical prowess and achievements.

This years I/O highlight was Duplex, a Google Assistant feature that— using recurrent neural-network-based AI (and a telephone) — will schedule appointments for you. To demonstrate, Google had Duplex call two businesses: a hair-salon and a restaurant. With lifelike voices and natural conversation flow — including appropriate disfluencies (the technical term for “hmms” and “uh’s”) —Duplex appeared to fool both humans into thinking a living, breathing, potential customer was on the line.

By displaying behavior indistinguishable from that of a human, some think Duplex could pass the Turing test. Google says not: “Duplex can only carry out natural conversations after being deeply trained in such domains. It cannot carry out general conversations.”

Outside Google, nobody can say what the system is capable of — the calls were edited (and the call recipients may have been in on it — Google won’t comment). We can say that Google engineers overcame enormous technical challenge to achieve intelligent, free-flowing, and natural-sounding conversation. Also, that by japing other humans, they exhibited quintessentially human behavior.

Google continues to blaze new trails in AI. With Duplex, they’ve opened a new frontier: prank phone calls. It’s the latest salvo in Google’s ongoing effort to kill off the telephone.

Google grew up hating on the telephone because phone calls divert attention from their cash-cow: web search. Plus telco’s are obstinate, superfluous corporate layers between Google and humans. Pre-4G, searching the web from a phone meant dialing GOOG-411 or a similar “voice portal” like Quack, Audiopoint, Yap or Tellme. Or the most famous one of all, Siri. Tellme lives on in Cortana (aka Louise aka Bing Mobile), Alexa started life as Yap, and the learnings from GOOG-411 spawned Duplex. Mated with machine-learning and massive cloud computing power, speech-recognition no longer frustrates — and is a button-push away (or completely hands-free on smart speakers and certain phones that always listen). So voice-portals are back, and ComScore says 50% of all searches will be voice searches by 2020.”

According to a survey of 2,000 active voice users, the driver of this shift to voice control is…..driving — it’s by far the top activity use-case for over 50% of users. Number 2, the general use-case of “other activity” is only 20%. Nick Unger, who founded the Audiopoint voice portal, said these survey answers align with actual usage data. Cars are the perfect place for voice control:

Mobile Voice — When & Why www.highervisibility.com/blog/how-popular-is-voice-search/

Jim Kenefick of Clique Capital, a venture fund focused solely on voice-control, says: “voice is the interface of tomorrow®” (yes, he trademarked the phrase). Indeed it is — and has been for over 50 years: IBM’s 1962 “Shoebox” voice recognition system pre-dated keyboards and CRTs. Yet — finally — tomorrow has arrived.

The IBM Shoebox printed out responses to voice commands

Google’s Duplex changed the conversation (ahem) around voice-controls. Until the Duplex demo, the 2018 buzz was around Amazon and Alexa. And that buzz was mostly about the huge 3rd-party Alexa Skills ecosystem and Amazon’s integration of shopping and home-automation. Duplex’s intelligent robotic conversations are a new twist — outbound voice, via the PSTN (Public Switched Telephone Network). EG, phone calls.

Millennials in particular abhor phone calls. They consider mobile phones indispensable — paradoxically, for everything except talking. Speaking for the entire generation, one blogger says it’s because they’re too busy: they want results, not a lengthy discussion. They’ve been conditioned to expect instant results with minimal effort, as delivered by Twitter and other social networks. This ↓:

The Millennial’s Fear of Conversation

Yes, transactional phone conversations are inefficient and time-consuming — but fear of the telephone runs deeper:

  • Need for synchronicity: multiple parties meeting at the same spot in the time-space continuum. Messaging sits between synchronous voice and asynchronous email — where reply expectations are measured in hours or days. Our Aptela directory included one-click options for voice, email, and message — when companies operationalize messaging, behavior becomes identical: employees start using “semi-synchronous” messaging to ask permission to call or meet. Eventually, messaging becomes the standard (see Slack), with its own conversational rhythm.
  • Risk-aversion: potential misunderstanding and rejection make conversations emotionally risky. Just interrupting someone risks ill-will. Messaging also risks emotional misinterpretation (emoticons help), but if there is one, it’s less apt to end with traumatic confrontation.
  • Attentional investment: multi-tasking during conversations is rude. Conversely, it’s expected while messaging — and hidden anyway.

Eliminating conversations averts these risks. And appeals to humans innate and deeply engrained desire to minimize effort (explained by Dr. Adam Gazzaley and Professor Larry Rosen in their book Distracted Minds: Ancient Brains in a High-Tech World)

Distracted Minds explains in detail what Staples figured out 15 years ago — and CEB eventually quantifieddelighting customers feels good, but has poor ROI. Consumers’ hot button is an Easy Button. The only thing better than an easy button is telling someone (or something) to press it for you. “Thanks for the yummy cookies in my hotel room — but if you really want to delight me, just please don’t make me call customer service.”

End-to-end automation has tremendous upside — and downside —that voice-control amplifies. Repeating yourself is frustrating — with either people or machines. Frustration escalates to anger when someone feels ignored. AI-driven voice-control is the next generation of Interactive Voice Response (IVR). Of all humanity’s inventions, Automated Speech Recognition (ASR) driven-IVR gone-wrong is unsurpassed at raising customer ire.

Companies can’t afford to ignore digital assistants: they’re the next manifestation of web-search. Today, pages that don’t turn up on Google search might as well not exist. Voice will be the same — requiring a new riff on SEO — I’ll call it DADO: Digital Agent Discovery Optimization. Digital assistants are also transactional — they both find pizza and order it. Eventually they’ll check on delivery, and resolve any issues. That requires what I’ll call CASE — Customer Agent Service Enablement.

Duplex escalated the debate about AI’s role in society. One obvious downside is a deeper retreat into the individual echo-chambers built by social media. Anyone can call on the phone — only invited friends can Snapchat.

Rob High, CTO of IBM Watson, says “AI and humans perform best when they can work together and trust each other.” That’s especially true in our post-truth world. Upcoming posts will explore methods to build digital trust, and then use it to sell more and enhance customer experience.

--

--

Howard "Bart" Freidman
Rule the Robots

Revenue accelerator: distributes growth hockey stick. Futurist & pastist. Loved by both Rick and Morty.