Smart Speakers in 2020— Why a Lockdown-Filled Year’s Been Gloomy for Voice Assistants Too

Rob Leyland
The Startup
Published in
10 min readDec 31, 2020
Photo by Thomas Kolnowski on Unsplash

While 2020 has tragically spelled doom for so many industries that thrive on footfall and operating a physical space, there’ve been some huge winners in tech. The likes of Zoom, Netflix, and Amazon have all seen big growth as we’ve tried to live our lives remotely. And with so many people stuck indoors this year, it might be expected that among 2020’s biggest pandemic beneficiaries we’d see Alexa, Google Assistant, Siri and friends following suit.

Yet despite being tailor-made to weather endless days sat on their arse in living rooms and kitchens across the globe, 2020 hasn’t exactly delivered the voice assistant boom we might have expected. Quite the opposite: year-on-year sales growth slowed and in some cases dropped, developer enthusiasm for building Alexa skills (Amazon’s term for voice apps) has dropped off a cliff and while monthly smart speaker usage is estimated to have grown 11% year-on-year, in a year when the audience for these devices has quite literally been captive, it doesn’t exactly set pulses racing.

2019 New Year’s resolution — start making money!

Let’s flash back to a simpler time at the end of 2019, and smart speakers were flying high, with mass adoption triumphantly declared as they became more popular in UK homes than cats and carbon monoxide alarms.

Common household contents at start of 2020 — broken down by % of UK households containing each

The rapid growth of smart speakers to date in the UK has been impressive, and even more so in the US where monthly usage of smart speakers has been estimated in 2020 to be as high as 39% of the population. Certainly the relentless heavy-handed above the line advertising of voice assistants has taken a noticeable back seat this year, and it might be fair to think that as Bezos and co. were popping their Champagne corks to welcome in the New Year, customer growth may have finally become less of a priority for 2020. Deliberate or not, sales growth is indeed cooling — Q3 sales showed a year-on-year growth of just 2.6% across all smart speaker devices — with the market leader, Amazon, even seeing a small drop compared to 2019 figures.

A change of focus is somewhat understandable. After every period of rapid loss-leading growth reaches a certain point, the thorny question of how to turn a profit arises, and that was very much the case for voice assistants in 2019. As a voice assistant developer working in the charity sector at the time, it was clear to see there was a noticeable push from Amazon for Amazon Pay integration into voice skills, reflected in many developer rewards being geared much more to those skills that integrated some sort of payment mechanism. Indeed, every conversation around voice with Amazon at the time focused on charities getting Alexa Donate skills made (Amazon’s mechanism for not-for-profit use of Amazon Pay).

While others played adoption catch up, Amazon wanted to make inroads into ensuring their market-leading ascent into mainstream smart speaker adoption paid off. A thriving development community was viewed as a key instrument in generating the same sort of mass-profit seen for Apple and Google taking cuts on App Store and Play Store purchases respectively. Having failed fairly miserably with the Amazon App store and Fire devices to infiltrate the mobile ecosystem, maybe being the frontrunner this time with a new technology could be more fruitful.

The ‘what’s in it for me?’ problem

Going into 2020 Amazon had been directly fueling the healthy developer growth it was seeing with regular perks and developer challenges targeted at growing out the categories of skills on offer, providing free device and cash incentives for the best participant entries, leading to over 100,000 unique skills being added to the Alexa store globally. Over the last 6 months that approach has completely dried up, with the English language perks page at least, sitting completely barren.

Amazon do still offer an Alexa Developer Rewards scheme,but I think opaque would be putting it mildly when the rewards are based on ‘customer engagement’ and, at time of writing, there’s no info at all on what the monetary value of them actually is (at least that this author could find).

So if you, budding Alexa developer, now choose to browse around for why to get involved and hit the Alexa developer homepage you’ll find yourself greeted by the following 3 reasons (none of which are ‘win a free Echo’…or even a pair of Alexa developer socks):

  • ‘Serve customers naturally’
  • ‘Expand your reach’
  • ‘Make money in your skills’

Having initially appealed to every side-hustler and hobbyist with a bit of JavaScript about them with some fun freebies to grow skill coverage, Amazon are now targeting brands that can use Alexa to broaden their channel offering and developers who want to make a serious go of it generating revenue through in-built Amazon Pay; the market for that fun throwaway skill that tells you a joke or an inspirational quote is well and truly saturated.

The obvious issues with asking those brands to ‘expand their reach’? Even for the biggest companies, a new tech offering with a ceiling on potential user coverage of 20% of your overall customer base isn’t anywhere near enough to warrant spinning up a bespoke team to support. Particularly a platform that requires specialist skills and a different design approach from the familiar screen-based development that already challenges a vast swathe of big brands to get right. As a voice tech enthusiast working as product manager on a product with daily calendar management functionality tailor-made for voice, it’s kind of heartbreaking that the market penetration of the devices themselves makes a business case for this sort of thing a non-starter. For most businesses it just doesn’t add up.

And as for ‘make money in your skills’— the Amazon Pay bit — making a standalone Alexa skill that is compelling enough for people to pay for in of itself is…well…really, really bloody hard! So the resulting state of play on the skills store has been a mass of free to use, generally independently developed, and generally fairly shallow, third party apps in a menagerie of skills where one that literally gives stuff away for free remains the only real runaway success story going. It says a lot about the prospect of making money from your skills.

What the third party voice market absolutely needs to light a fire under it is that killer app that sets a shining example and North Star for what great innovation looks like with these high potential pieces of technology and what a business model and pricing approach looks like to match. A voice app that fundamentally re-addresses peoples’ attitudes to the standalone value of this medium. This is no small ask.

Why’s making a great voice app so hard?

If you’re looking for evidence that Amazon want more than anything to make this killer voice app and voice-based business happen — check out the Alexa Fund. Nothing says ‘go big or go home’ like a $200m VC funding challenge — suddenly, budding Alexa developer, you aren’t yearning so hard for that free Echo or £100 of Amazon store vouchers are you? And Amazon is smart to adopt this strategy — they need talented people devoting their life to making an Alexa hit, and they need those people to be outside Amazon doing the most radical stuff possible to make it happen.

While the funding infrastructure might to some extent have been put in place, it’s hard to ignore the drag factors that the platform itself presents; although I’m a massive advocate for voice tech, there are some major issues currently that can’t go ignored:

  1. Latency — talking to smart speakers right now is like talking to a human who pauses for several seconds before saying something back to you, every time. Imagine how annoying you’d find that friend. Then also imagine if that friend misheard you 20–30% of the time and liked to speak in ambiguous error messages on occasion — fuuuuuun.
  2. No interruptions — you know that amiable chum from point 1? Yeah, they also have this thing where, unless you say their name at the start of the sentence or wait for them to totally finish what they were currently saying they will ignore any attempts made to converse and carry on talking. And they also have no awareness of any physical signalling you might be giving throughout. Shall we start calling them your BFF already?
  3. “WTF is a skill? What do you mean enable it?” — all of the new vocabulary that has been introduced by big tech companies has not exactly caught on with the public. I’m sure years of research went into this but would calling it an ‘app’ and telling people to ‘download’ really have been such a bad idea?
  4. “Oh and there’s a store? Really?” Go on, survey 3 smart speaker-owning friends who aren’t voice developers, I’d be surprised if 1 of them had visited the store for their device. Just 3 days into Apple’s launch of the app store there had already been 10 million downloads — with ‘app’ becoming word of the year just 2 years on. The plug and play separation of native and third part functionality on smart speakers & the difficulty of browsing by voice has almost certainly hindered the skills store reaching such heights.
  5. Developer control (or lack thereof) — some really important parts of the voice app experience are still heavily locked down from third party developers. Discoverability through voice (i.e. matching someone’s spoken intent to the best app to carry out the action) is still completely at the mercy of the voice assistant parent company, who regularly experiment and change how this works on their voice assistants, yet lack the traffic and list interface that make this work so well on a search engine. And monetization options are also limiting for developers and fiddly for users, with Amazon and Google keen to push their proprietary payment options on these devices that a majority of users are unlikely to have set up.

All this is on top of the basic requirement that the voice recognition algorithm converting what’s been said to text, and the Natural Language Processing algorithm identifying the intent and directing to an appropriate answer are working with high accuracy — neither of which are anywhere near a sure thing right now.

Are these showstoppers to a breakthrough hit coming along? Probably not, constraint is a parent of innovation after all. But if serious progress were made to correct some of these (a next level update to smart speaker hardware and advancements in edge computing could well be an answer to the first few at least) we’d be a heck of a lot more likely to get that first-of-its-kind voice app to herald a public opinion shift and gear change for voice-enabled tech.

Light at the end of the tunnel

Photo by Ivan Bandura on Unsplash

For all its disappointments, 2020 has still presented some reasons to believe. For one it’s been a year when the QR code has defied all odds and become a linchpin in mankind’s technological response to a global pandemic — I don’t think we should be writing off any faltering new technologies just yet. And if the multiple headaches I’ve seen suffered by co-workers faced with 8-hour days glued to a screen are anything to go by, there are genuine needs emerging for user interfaces that don’t require us poor humans to adapt ourselves to methods of interaction at complete odds with our biology. The demands on our interactions with technology are simply becoming too commonplace and essential to afford not to, and 2020 has been a wake-up call to that.

Until Elon Musk gets his brain chip in all of us, and we can visualise who we’re talking to through nerve impulses on our own personal neurological Virtual Reality, it’s hard to see a better bet than voice tech to take us off our keyboards and away from our mice and screens. Perhaps it’s difficult to see how audio can be a wholesale, preferable, replacement for the website infrastructure you’re used to using to serve your every need today. But if we look beyond the many hurdles bogging down voice experiences right now, and believe in a world where improved AI removes a lot of that clunkiness, it’s actually a decent bet to succeed before most humans are prepared to go full cyborg. Although they haven’t made it a reality beyond early betas, Google’s Duplex demo at Google I/O 2019 really did demonstrate the heights of user convenience this technology can reach and the powerful experiences it could accomplish.

So while this year’s been on all surface indicators a big fat fail for smart speaker tech, that doom and gloom could well prove to be as temporary for voice assistants as it hopefully will be for the rest of us. The big players need to do more than the audio and aesthetic updates to flagship devices they’ve managed this year to push themselves beyond glorified radios, and the development community and businesses need a shining light 3rd party voice app. But the growth of remote living and working this year, presenting problems that voice tech should eventually solve very well, can be seen as a silver lining going into the next decade and beyond — it’s just not as ready as Zoom has been to do that right now, ok?

--

--

Rob Leyland
The Startup

Product manager and writer on all things product development, emerging technology, innovation and ethics