The Future of Voice is Visual

Carat Global
Feb 6 · 5 min read

Elio La Grua, Strategy Director, Carat Global

The rising sales of smart speakers and visions of screenless web browsing have led many to anticipate a new golden age of audio and voice-activated marketing. But alongside the number of smart speakers sold, there is growing scepticism about the medium. In the WARC 2020 Marketers Toolkit, only 22% of marketers consider Voice to be ‘very important’ to their marketing strategy and a third of those surveyed viewed Voice as ‘not important at all’. People might be shouting about voice-search, but their cries of a golden age appear to be heard further away than once thought.

There’s a reason why Times Square exists as it does. Effective communications rely heavily on what we can see. As a result, many brands find it hard to connect with consumers in the Voice space. Without the array of visual options for their communications, brands are constricted in how they can engage consumers. Experience is the priority — no one wants the answer to “what’s the weather like today” with the response “your weather forecast is sponsored by…” As a result, Voice platforms keep responses ad-free. This denies advertisers basic options they are afforded with visual channels. How do you display ads via Voice that don’t ruin the experience?

Brands that develop voice-apps for consumers also face a discoverability challenge. A recent report by e-consultancy highlights how a majority of users — 51% — have never used a third-party voice-app. People’s behaviour on smart speakers currently only revolve around a few functional commands, so many skills go undiscovered or abandoned, consigned forever to the voice graveyard.

Alexa, are Voice assistants really worth the hype?

Our answer is yes, but not necessarily in the way we imagined it.

China leads the way in voice-display

When predicting the future of technology, it’s worth looking to the East. Three of the five biggest smart speaker manufacturers come from China. While Amazon is still the global market leader, in 2019 the Chinese Baidu overtook Google in smart speaker sales. What’s even more remarkable is that over 60 percent of speakers Baidu sold in Q3 last year came with a display. If the future of technology is more and more Chinese, it’s safe to say that the future of Voice is more and more visual.

With on-screen information providing additional context, consumers will be more easily able to engage with their devices. This will enable brands to benefit. As their voice cues are presented on a screen simultaneously, their content is made easier to discover. As a result, their product’s image helps to prompt voice-activated sales.

In the West, Voice-commerce is very much in its early stages, and consumers are hesitant to buy a product they cannot see. Consumers are literally being asked to “buy blind”. But in China, these hybrid interfaces show people relevant information, and this gives people the confidence to purchase with their voice. The visual enrichment may go some way to explaining the one million voice-activated purchases on 2019 Singles Day, China’s equivalent to Black Friday.

Smart homes unlock the potential of smart speakers

Brands that have adapted their owned properties to handle conversational engagements, with, for example, a chatbot mechanic, will be well placed to benefit from the emerging opportunities. It is also easier to imagine paid-for advertising opportunities on smart displays. On-screen ads would naturally complement the user’s experience, and they do not demand their complete attention.

New smart displays such as the Echo Show, Google Home Hub and Facebook Portal, all include an in-built camera. This means that brands may soon be able to video chat directly with their consumers and even personalise on-screen messages to who’s standing in front of the screen.

In the US, sales of voice-interactive smart displays rose 558% in 2018, and the Echo Show accounts already for a one-fifth of Amazon’s smart speaker sales. Even if this rise tapers-off over time, it will be due to them being displaced by other more advanced and more broadly connected tech, such as the smart TVs that are Google Assistant integrated.

As James Moar, a Juniper analyst says, “This (future) will be fully realised in the ability to transition between platforms — when you can ask your smart speaker about booking hotels, and it hands off the response request to a smart TV to display a variety of options.”

Brands should ‘show and tell’

Voice-enabled screens will allow brands to bring their visual assets to life via a new medium. But it can’t be a cookie-cutter approach. Advertisers should resist the temptation to insert their TV or YouTube assets and instead make use of the conversational features on offer. This means advertisers should develop new use-cases for smart displays, by creating Voice-first applications that are visual by design. To do this, we need to gain insight into the voice-audience by understanding their activity and usage.

In contrast to the functional emphasis of voice behaviours on mobile, in-home Voice activity focuses on content and entertainment. The interactions lend themselves to fun applications and tend to happen in the living room, where the whole family can play along together. Advertisers could, for example, develop fun communal experiences, such as image-based quizzes or interactive ‘Bandersnatch’ style stories that families navigate with their voice. Engagement doesn’t have to be sales orientated.

Google Home Hub: 15 Mornings Sephora Spot

With the immense popularity of video and ‘How’ the most popular voice trigger word, it is no wonder that video is already being used to answer voice users’ ‘how to…’ questions. Sephora utilised this approach in their library of beauty tutorials, so people whose hands were busy applying make-up, could ask for the most relevant makeup tutorial, hands-free.

Diageo too tapped into the visual functionality of the Echo Show, with their mixology focused skill, ‘The Bar’. The skill answers ‘how to make…’ questions, by both showing and telling people how to make an array of cocktails. As this all happens on an Amazon device, the ingredients could then be bought immediately with a simple voice command.

Although the opportunities are not yet fully clear, voice is set to become a key way we activate and engage with images, videos and text. By mid-2020s, the number of Voice assistants is expected to surpass the number of people on the planet. Voice to visual is still in its infancy, but forward-thinking brands will do well to watch this space.

    Carat Global

    Written by

    Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
    Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
    Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade