The voice interface and disruption
The experience, the tipping-point, and the new world order (sans Apple)
The biggest problem standing in the way of the voice epoch is that nobody has established a user experience that’s native to the format. How do you find what you’re looking for, among a universe of infinite information, without your hands, and without your eyes, in an audio-first paradigm? How do you perform fundamental tasks like browsing the web or scrolling through feeds in a voice interface?
Nobody has a silver bullet yet, but that shouldn’t come as a surprise. UX paradigms often aren’t brute forced via a single breakthrough as much as they are accumulated gradually. These UX conventions are often residuals imparted by the platforms who define the epoch. In the meantime, the voice interface is pervading the marketplace — from early adopters to the mainstream, from podcasts to audiobooks, from Echos to AirPods. New consumer habits are already forming under our noses.
If you accept the superiority of voice-first relative to today’s text/touch paradigm — due in part to the ergonomic potential of hands-free and eyes-free — all media will have to converge upon voice by definition. Therefore, whether text, communications, podcasts, or videos, all media creators will have no choice but to optimize for format fungibility. That grassroots buy-in will accelerate adoption. In fact, it’s already happening, as I pointed out in “End-to-end audio”:
“For example, text is the primary means of media interaction today, but audio/voice are almost always available as secondary alternatives. Driving in the car and can’t read your phone? You have your voice assistant read it aloud to you. The prevalence of these backup options means media is already fungible, allowing us to consume or produce anything, in any format — text, audio, and even video — jointly and severally, with relative ease. We can already choose how we want to interact with every piece of content, depending on situational context. When we invert users’ habits by prioritizing audio over text, text will still be there as a backup when necessary.”
Regardless, the tipping-point will occur even sooner...
It’s not inconceivable that incumbents, like Apple, are indefensible in the face of this disruption. Right now, we’re already undergoing a transition from text to voice, as discussed. Amidst this sea-change, Apple has not only managed to preserve its relevance, but also increase ARPU, gross margins, and revenue with complementary products like Apple Watch and AirPods.
But, that device ecosystem is only a foothold in the transition from text to voice, because in that transition phase, “voice-first” is limbo, wherein text and touch survive only to serve as increasingly irrelevant safety nets — fallbacks we resort to when voice can’t get the job done.
In the strong-form version of the voice interface, devices are abstracted away and interaction is entirely ambient. (How is that not desirable — even optimal?) En route to that strong-form future, the next stop is a tipping-point at which the phone becomes obsolete, because it’ll be too big, too visual, too manual, and/or too expensive to serve as a redundant backup to improving voice and smartwatch technology.
That tipping-point occurs when the majority of our device interactions are denominated in voice/audio. At that point, consumer preference will subconsciouly, habitually, inertially resort to those chosen mediums first. We’re dangerously close to that fulcrum today, as I alluded to in Apple’s HomePod Threading a Needle:
“Voice abstracts-away the physical device… You might have a phone in your hand when you’re watching TV; you might have a smartwatch on your wrist when you’re running; headphones when you’re at work; a sound system in the car; or a home speaker in your kitchen. No single device can handle all of those use cases. It’s a matter of which gets the job done with the least friction, in each different context [but we] don’t even interact directly with these devices — they’re just a back-end, delivering and receiving the audio that’s really our primary touch-point [with] the voice assistant as the common thread woven through our every interaction…”
That tipping-point is when Apple will be most vulnerable. The iPhone is its sacred cash-cow, which presents a number of classic incumbent risks — financial and otherwise. Perhaps the most important of those risks is this: Apple Watch is not designed to supplement AirPods; it’s designed to supplement (and ultimately succeed) iPhones. Apple Watch’s raison d’être is to adapt the mobile UX for a wearable standard. It’s an evolutionary iteration on a bygone era — one based on physical, visual, and touch. Yes, that sounds like a good thing for Apple, since Watch will be ready to take the handoff when iPhone is obsoleted — whether the smartphone is laid-to-rest by Apple’s own hand or a competitor’s. But, it’s a losing strategy in the grand scheme of things…
First off, doing a phone’s job is a broad mandate, which comprehensiveness makes Apple Watch a necessarily expensive device — particularly were it to displace iPhone’s revenue footprint. Of course, Apple has always earned its premium pricetag by delivering a superior user experience. But, if the phone gets obsoleted by a voice/watch combo, the best UX at that time will, by definition, be voice. At that tipping-point, the “pleasing experience” of using an Apple Watch will be irrelevant; all that will matter is the voice UX and the Watch’s ability to support it with as little friction as possible. Sure, AirPods could deliver the ultimate, killer voice experience you’d expect of Apple, but the Watch complement is still a necessary ingredient to make the combo “good enough” to top the iPhone in the first place.
The problem is that Apple Watch has competing priorities: Both supplanting the iPhone and supporting voice are at odds with one another…
- A Watch that does too much will be gratuitous in a voice-first world (e.g. customers would be “overserved” by incumbents, as discussed in Clay Christensen’s Innovator’s Dilemma);
- A Watch that doesn’t do enough will cannibalize two revenue streams (iPhone and Watch);
- Finally, a Watch that strikes a perfect balance between these two will always be inferior to the iPhone.
It’s a catch-22 for Apple’s financials, principles, and logistics. Never mind Apple’s obligation to the 3rd party developers who sustain their mission-critical platform. Never mind Apple’s additional, structural, competitive disadvantages.
Sure, as the primary transmission mechanism for end-to-end audio, superior headphones would warrant a proportional pricetag. AirPods are (and will remain) best-in-class, but they could never displace the lost iPhone/Watch revenue.
Furthermore, for voice interface accessories to be pleasing, they need to be simple, as simple as possible, because the point is to return you to voice as quickly as possible. That means Apple Watch won’t even need touch capabilities — it should be a dumb screen, as dumb as possible. After all, “voice-first” is the next big thing because, again, it’s the most frictionless, ergonomic medium at our disposal. Therefore, time spent doting on your phone or on your watch is, by definition, an inferior experience.
This isn’t to say Apple is doomed; they’re just ill-suited to win the voice interface as definitively as they won prior epochs. This is new market disruption theory in action (as opposed to low-end disruption). Were they to sit-out this wave, Apple would either find a different wave to ride, or bide-its-time gearing up for the next new wave (AR?).
On the topic of tipping points…
Of course the age of abundant information is prone to beg, borrow, and steal your attention. Reading blogs, news, and research has always been an inefficient user experience — finding needles in haystacks. But, Annotote is the antidote. Don’t waste time or attention; get straight to the point.