Siri & The Suspension of Disbelief
When I was in graduate school I was fascinated by the lies you tell yourself in order to use a touchscreen — as you touch and manipulate images under glass, you forget the fact that none of it is real. I think the best way to think of this is as a suspension of disbelief — by letting go of the knowledge that this isn’t exactly real, everybody has a better time.
At the time I was working with researchers at Microsoft Research to study their high-performance touch device (here’s a link to our paper). I had a working theory that latency was a big factor in helping or hindering this suspension of disbelief when using a touchscreen. As latency increased, your touchscreen actions were likely to look and feel less and less real. (I think the liberal use of skeuomorphism in early iOS releases was also — in part — to support the Big Lie that everything under the glass was really real.)
We’re now a lot more comfortable using touchscreens, and we need fewer cues that reinforce the realness of the experience. Smartphones have gotten more responsive, and we’ve grown accustomed to using a touchscreen.
The same can’t be said for Voice Assistants. It’s a supremely weird experience to speak to a computer in natural language and expect an answer — as weird as it was to touch a piece of glass and expect something to happen ten years ago. I’d argue that it requires a similar suspension of disbelief, and every time it fails you catch a glimpse of the rigging offstage and you trust the whole thing a little less. Voice Assistants today are comparable to touchscreens a decade ago — fascinating and great at times, but not entirely ready for primetime.
Walt Mossberg published a great roundup of his frustrations with Siri in the 5 years since it launched — “Why does Siri seem so dumb”:
So why does Siri seem so dumb? Why are its talents so limited? Why does it stumble so often? When was the last time Siri delighted you with a satisfying and surprising answer or action?
For me at least, and for many people I know, it’s been years. Siri’s huge promise has been shrunk to just making voice calls and sending messages to contacts, and maybe getting the weather, using voice commands. Some users find it a reliable way to set timers, alarms, notes and reminders, or to find restaurants. But many of these tasks could be done with the crude, pre-Siri voice command features on the iPhone and other phones, albeit in a more clumsy way.
Mossberg details at length a litany of questions that you’d (wrongly) expect Siri to be able to answer. A few nights ago, I spent some time with Padraig trying to find the magic words to get Siri to give us the time of the upcoming US Presidential debate:
“When does the presidential debate start tonight?”
I couldn’t find any movies matching ‘the presidential debate’ playing nearby tonight.
“What time does the united states presidential debate start tonight?”
I don’t know what you mean by ‘what time does the united states presidential debate start tonight’.
“When is the presidential debate?”
The second presidential debate will take place from 9:00pm to 10:30pm ET on Sunday October 9 at Washington University in St. Louis, Missouri.
There we go. At this point we’re getting closer to verbally brute-forcing an unknown set of command-line parameters.
In their response to Mossberg, Apple has noted that they’re interested primarily in the most common day-to-day questions that Siri gets asked, and not the “long tail” of less-common questions:
Apple stressed to me that it’s constantly improving Siri, and also stressed that it focuses its Siri efforts on the kinds of tasks that it says millions of people ask every day: placing phone calls, sending texts, and finding places. It puts much less emphasis on what it calls “long tail” questions, like the ones I’ve cited above, which in some cases, Apple says, number in only the hundreds each day.
Here’s the problem: Apple doesn’t seem to be factoring in the cost of a failed query, which erodes a user’s confidence in the system (and makes them less likely to ask another complex question). Apple’s Siri homepage cites a wide range of questions you can ask Siri, from the weather to sports scores to when the sun sets in Paris. The overall effect is to imply that you can simply ask Siri a question and get an answer, but in practice that almost never works outside of a few narrow domains (sports, weather, math and unit conversions).
In 2007 Apple made a lot of very smart choices around prioritizing touchscreen latency above nearly everything else — one of the cleverest things that iOS did while scrolling a webpage was to show the user a very responsive scrolling grid if you scrolled past the region of the page that your phone had rendered. Where other devices lagged and hiccuped as the device struggled to render new content while scrolling (which every smartphone had trouble doing in 2007), the iPhone made sure your scrolling still felt right.
Apple’s high-level goal here should be to include responses that increase your faith in Siri’s ability to parse and respond to your question, even when that isn’t immediately possible. Google Search accomplishes this by explaining what they’re showing you, and asking you questions like “Did you mean ‘when is the debate’?” when they think you’ve made an error. Beyond increasing your trust in Siri, including questions like this in the responses would also generate a torrent of incredible data to help Apple tune the responses that Siri gives.
Apple has a bias towards failing silently when errors occur, which can be effective when the error rate is low. With Siri, however, this error rate is still quite high and the approach is far less appropriate. When Siri fails, there’s no path to success short of restarting and trying again (the brute force approach). A failure response that remains conversational would have a few very important effects:
- it helps Apple learn about what the user was trying to ask Siri
- it maintains the suspension of disbelief — you’re carrying on a real conversation
- it provides the user a less painful path to their final answer
If Siri asked you to restate your question differently when it fails to find an answer, it maintains the illusion of a conversational assistant — real people ask questions like this all the time. Siri could ask clarifying questions (“Are you asking me about movies?”) to help zero in on a domain area.
Perhaps Conversational Assistant is really the best way to think about it here. People ask follow-up questions to clarify, to get more context, to understand the query better, and there’s no reason why Siri should be any different.
A superficial suggestion here is to say “get it right all the time,” — that’s like suggesting that the iPhone in 2007 solve the problem of scrolling web content by always rendering it in realtime. Until Apple gets there, we need some creative ways of helping us suspend our disbelief.
Hey thanks to Adam Cohen, Arwen Giel, Ash Furrow, Justin Williams, Loren Brichter, and Pádraig Ó Cinnéide for reading drafts of this.