Alexa, Where Are We Going With This?

Evaluating the voice assistants & conversational interfaces space in Mid-2018

Published in

IPG Media Lab

13 min readAug 23, 2018

Ever since Amazon surprised the world with Amazon Echo and Alexa in late 2014, we have seen a significant growth of voice computing, led by the wide adoption of voice-activated smart speakers and a renewed interest in voice assistants on mobile devices. Although there is still plenty of room for user growth and improvement in voice assistant accuracy, voice remains an exciting innovation territory for consumers and marketers alike to explore.

Alexa, are you everywhere yet?

Going by the numbers alone, it would appear that Amazon’s previous dominance of the smart speaker market is slipping. According to the latest data from market research firm Canalys, the worldwide smart speaker market grew 187% in the second quarter of 2018. Among them, Google leads with 5.4 million Homes shipped, and Amazon closely follows with 4.1 million Echoes. Overall, Amazon still dominates the U.S. smart speaker market with a 62% share, but that is a considerable drop from the beginning of the year when that number was 71%. As new smart speaker entrants like Apple’s HomePod erode Echo’s market share and new Echo sales appear to be predominantly going to existing users, Amazon Echo U.S. user market share will likely continue to fall.

But there’s no need to sound the alarm for Amazon just yet, for as troubling as the declining market share may seem, they are declining in a rapidly growing market, and market share won’t really matter that much in the long run. After all, volume alone is never everything. Far more valuable is to capture the high-value consumers and keep their loyalty. Just ask Apple how it thinks about being surpassed by Huawei in global smartphone market share.

At this point in time, Alexa is undoubtedly winning the cultural war among the voice assistants, spawning popular memes and a star-studded Super Bowl spot that went viral, thus creating valuable mindshare among U.S. consumers that equates Alexa with voice command on smart speakers the same way we equate Google with search. Simply put, Alexa’s cultural cachet is now Amazon’s biggest advantage in pushing Alexa-enabled devices deeper into the smart home ecosystem of their users.

Alexa is undoubtedly winning the cultural war.

Amazon is showing no sign of slowing down the pace on its Alexa Everywhere initiative, a strategy itself that nicely mirrors the ecommerce giant’s ambition to become the Everything Store and the ecommerce layer that could take a cut out of all activities. In the past months, it has introduced a Show Mode that will transform the humble Fire HD tablets into essentially an Echo Show for a hands-free Alexa experience; it has unveiled Fire TV Cube — a new TV set-top streaming device that comes with a full Echo inside. It has also started to roll out a special version of Alexa modified for the hospitality industry, hoping to make Alexa the default voice interface that powers the modern guest experience. Marriott is among the first hotel chains to get on board to integrate Alexa as a digital butler to cater to their guests.

All these efforts point to Amazon’s relentless focus on building user base and establish Alexa as fabric of the everyday consumer life in a smart home environment. They will play music via Alexa, get their weather forecast with Alexa, call their friends via Alexa, and, perhaps most importantly of all, shop on Amazon.com via Alexa. But, if recent reports on the low user adoption of voice shopping on Alexa (only 2% of users have made a purchase via Alexa this year so far) serve as any indication, the range of functions of voice interfaces may have been less comprehensively utilized than the industry analysts have anticipated. Then again, Amazon seems to be in no hurry to get voice shopping to take off, for Alexa (and by extension, the Echo devices) is but a part of Amazon’s long-term strategy of locking in Prime households in its growing ecosystem for maximum lifetime value.

Hey Google, what’s with your Duplex Complex?

Interestingly, the most consequential new feature that Google has rolled out for its voice assistant this year so far focuses on pushing voice shopping as well. The new cost-per-sale Shopping Actions program that it launched in March aims to create a unified shopping program across organic search, Google Assistant, and its shopping service Google Express. It would encompass a wide range of domains that include product search, mobile shopping, and voice search. The program reflects Google’s existing partnerships with Target and Walmart to support shopping via Google Assistant and Google Express and extends it to other retailers, including the 40 retailers already enrolled in Google Express.

The endgame for Google here seems to be forming an anti-Amazon alliance with retailers by providing a strong cross-platform shopping program to aggregate retailers. While that may be a sound strategy for Google Assistant in the short run to match what Alexa can do, in the long run it doesn’t really make that much sense for Google, a search company that makes most of its revenue through ads. It reveals a rather misplaced priority for Google to be rallying retailers and attempting to fight Alexa on Amazon’s home turf of ecommerce, instead of, say, coming up with a viable and responsible voice ad network.

Compared to Amazon, Google lacks a Prime membership-like loyalty program with the extra perks that can get shoppers to make Google the destination for online shopping. While Google search makes for a great discovery and price comparison tool for shoppers, it is at a distinct disadvantage compared to Amazon in every other step of the online shopping consumer journey. Even the Google Lens, which started rolling out to some Android handsets in February and promised to transform product discovery in the physical world via visual search, has largely fallen short of expectations due to low accuracy and lack of integration with retailer inventories. And if smart speaker users are not exactly asking Alexa to buy things for them, they certainly are not asking Google Assistant to shop for them either.

This kind of misplaced priority is also deeply reflected in the way Google rolled out the other major update for Google Assistant. With a jaw-dropping demo during this year’s Google I/O developer event, the search giant unveiled its Google Duplex program to show off just how uncannily good the Assistant has gotten at mimicking human speech. Google positioned Duplex as a feature that can help users by calling businesses on their behalf and let Google Assistant conduct incredibly natural-sounding conversations to book reservations and find out about opening hours — all without ever disclosing to the humans on the other side of the phone that they are speaking to Google Assistant. As you can imagine, the internet was quick to call out the potential ethical issues that would entail.

The fact that the Duplex demo went the way it did, along with its spotty history in making hardware products, shows that Google has a tendency to do things not necessarily because it follows a well-formulated strategy, but rather because it can. And because Google can do a lot of things, it tends to end up making moves that are purely reactive to its competitors. Such is the peril of Google’s Duplex Complex. The flip-flop on spinning off and reabsorbing Nest is another example of Google’s general lack of an anchoring strategy for its voice products.

Google has a tendency to do things not necessarily because it follows a well-formulated strategy, but rather because it can.

At the end of the day, Google’s biggest strength lies in its ability to leverage its free data-capturing services to improve its AI capabilities. How to use its advanced AI capability to feed back into the user experience it offers is an issue that Google will be unlikely to resolve until it can resolve its Duplex Complex. It may be shipping more smart speakers than Amazon in recent quarters, but it is falling short of delivering the best voice experience it can, not to mention losing the cultural war to Alexa. And that does not bode well for Google in the long run.

Siri, what’s next for you?

Beyond the Amazon-Google duopoly, this year also saw Apple entering the smart speaker space with HomePod, which was officially made available to consumers in February. Since then, HomePod has been off to a slow but steady start, taking about 6% of the smart speaker market as of the latest quarter, according to data from Strategy Analytics. Given the way that HomePod is being positioned as a high-end speaker that happens to come with Siri for simple voice commands, rather than the kind of all-encompassing smart speakers that Amazon Echo and Google Home devices promise to be, it hardly seems fair to compare HomePod in the same volume game.

However, if you look close enough, there are some early signs that Apple is gearing up to make Siri more open to third-party developers so as to stay competitive against Alexa and Google Assistant and make HomePod a more useful speaker. One of the biggest new iOS 12 features unveiled at this year’s WWDC event was Shortcuts, which enables Siri users to set their own customized phrases for their most frequently used apps or even set up sequenced multi-app actions for Siri to run with a single voice command.

Once a shortcut is set up, it will work on HomePod and Apple Watch as well. Compared to Alexa and Google Assistant, which users will have to learn specific phrases to activate certain voice experiences, Apple is giving users more flexibility and offering developers a way to integrate their apps into Siri, while also indirectly giving HomePod a strong functionality boost. In addition, some recent reports also suggest that Siri may be getting multi-user support soon, which would surely further boost the usefulness of HomePod. Moreover, online scuttlebutt also suggested Apple is considering launching a $200 HomePod and putting it under the Beats brand, which we will have to wait until Apple’s hardware event in September to find out.

The current consensus surrounding the battle of voice assistants is that Siri is lagging behind Amazon’s Alexa and Google Assistant in terms of functionality, despite being first to the game on mobile, largely due to Apple’s decision to not open up Siri as a full-fledged platform for developers to create voice applications. However, it is important to remember that one, Siri is still the most used voice assistant with 10 billion requests processed every month and a robust high-value user base; and two, Apple is already laying some solid groundwork to bring voice computing out of smart speakers with Siri-enabled Apple Watch and AirPods, which none of its competitors are doing yet. Now that it is taking some major steps to open up Siri and boost its functionality via Shortcuts, Apple may very well end up being a late-blooming dark horse in this race of voice computing.

Apple may very well end up being a late-blooming dark horse in this race of voice computing.

Facebook & Other Players

On Tuesday, TechCrunch published a fascinating in-depth look at Facebook’s grand ambition in voice computing, thwarted by the public trust crisis triggered by the Cambridge Analytica scandal. The article confirms what a lot of industry insiders have been speculating for while. The social network has been developing a speech recognition feature for its flagship app and Messenger apps called Aloha while also filing patents for a speaker device that will focus on video chats and communications. It even planned to introduce a voice interface to Instagram for easier direct messaging. There is no word on when those features will be put back on the release schedule yet, although we will probably hear about them from Facebook before the year is over.

Nevertheless, there is some severe opportunity cost for Facebook to sit out on the voice computing race. We observed at the beginning of the year that Facebook was the only major US technology company that was missing the boat on voice and thus the smart home space. At the time, there was a distinct possibility for them to join forces with Roku and leverage the latter’s presence in the millions of households as an entry point for its voice assistant. Fast forward to now, Roku has entered the speaker market solo with a device that runs on its own Roku Connect Platform for remote voice control. All things considered, it is looking far less likely now a partnership could be in the books for these two, which will only make it harder for Facebook to successfully break into the voice-driven smart home market later.

There is some severe opportunity cost for Facebook to sit out on the voice race.

Looking beyond the U.S. market for a minute, however, it is also important to note that Facebook has a robust presence in many global markets where its trust crisis over data practices have done a far less severe damage on its reputation. This means that Facebook could still launch its voice products in some of the overseas markets first, and gauge their performance before introducing them in the U.S. market to take on its competitors head on.

While we are looking at international markets, it is also important to acknowledge the rise of Chinese brands in the global smart speaker market. While Amazon and Google still accounted for a dominant 70% share of global smart speaker shipments in Q1 2018, their combined share is down from 94% in the year ago quarter, according to data from Strategy Analytics. This is partly as a result of strong growth in the Chinese market for smart speakers where both Amazon and Google are absent. Although Siri is available in China, HomePod is not. Local giants Alibaba and Xiaomi are leading the smart speakers in China and their strength in the domestic market alone is enough to propel them into the global top five. Alibaba’s smart assistant, Tmall Genie, is even being directly integrated into the OEM dashboard systems of Mercedes, Audi, and Volvo vehicles sold in China, a move of cross-platform reach far ahead of most western voice assistants.

Hey guys, can’t we all just be friends?

Last week, Amazon and Microsoft began rolling out the Alexa and Cortana integration that was first showcased on stage at the Microsoft Build event back in May. This integration will allow Cortana and Alexa to work together to share data access and provide personalized responses regardless of which virtual assistant the user is asking. Such a collaboration opens some interesting questions about the future of voice assistants and how the voice computing ecosystem will continue to evolve.

As the default on Windows laptops, Cortana has access to a lot of people’s professional contacts and work calendars whereas Alexa is typically used in a personal home environment, so together this partnership should provide a comprehensive user experience for customers across their professional and personal lives. Microsoft shared that it believes that in the future, co-existing virtual assistants will work together to serve users but each with a separate domain it specializes in. That vision does seem like a likely outcome, at least in the short term, as it would likely take years for a single virtual assistant to emerge that can cater to every aspect of our lives. Eventually, the digital assistants may become more cloud-based and therefore hardware agnostic. But for now, each major tech player has their own voice assistant to push, and the market doesn’t seem to be — nor does it need to be — a winner-takes-it-all game.

Two Sides Of The Same Conversational Coin

The same principle of symbiotic coexistence should also apply to the distant mute cousins of digital assistants, aka chatbots. After enjoying a breakthrough year in 2015 when they exploded on all major messaging platforms, the buzz surrounding chatbots died down significantly during the past year or so as consumers discover that they are not that smart and easy to chat with after all. Although it did successfully replace some of the basic customer services for brands, it largely failed to deliver on the promise of ushering in a bot-led conversational computing era.

Nevertheless, it would be premature to disregard chatbots as merely a bygone fad. In 2018, we are seeing some good examples of brands and platforms finally figuring out the right use case for text-based conversational interfaces to fit into the customer experience. Walmart’s new concierge shopping service Jetblack uses texting as the interface for shopping, adding to the intimacy and convenience of its high-end personal shopping experience. Apple has started rolling out its Business Chat after nearly a year of preparation, which could drive significant brand-customer interactions from phone calls to texting.

It would be premature to disregard chatbots as merely a bygone fad.

Are chatbots still a thing in 2018? Not really, but text-driven conversational interface is not going anywhere. Whenever voice interfaces need to venture beyond the home environment and other personal spaces, they will always need a silent conversation mode to switch to. And as our AI powering the voice assistants gets better, so will our chatbots, ready to slip back into the conversational interfaces again. They are, after all, the two sides of the same coin of conversational interfaces.

Looking Ahead

As we near the end of summer 2018, the enormous hype around voice-led conversational interfaces has understandably subsided a bit since the last holiday shopping season when discounted smart speakers were flying off the shelves. The new additions to the Echo and Home lineups this year have been mostly met with a shrug, and there still has not been a breakout third-party voice application (Alexa Skill or Google Action) that can prove the validity of the voice app ecosystem.

If going by Gartner’s latest hype cycle for emerging technologies, one could see that consumer-facing voice computing (virtual assistant) is already past the peak of inflated expectations, with a long slide into the trough of disillusionment ahead. It’s hard to gauge how much longer will the voice assistants and smart speakers manage to stay on top before starting the descent and joining the chatbots in that trough, but the upcoming holiday shopping season should provide some indications.

Voice computing is likely past the peak of inflated expectations in the hype cycle.

If anything, conversational interfaces will need to further establish more robust use cases while platform owners look into building a profitable model for third-party voice skills a la the App Store in order for the voice ecosystem to truly flourish.