Image credit: The Next Web

Multimodal: Voice First Buzzwords Explained

Vixen Labs Team
Oct 16 · 3 min read

Welcome to the first instalment of the series Voice First Buzzwords Explained. First up: multimodal.

If you’re reading this soon after we hit publish, you’re right on time. Amazon’s recent event unveiled new and upgraded multimodal devices. But… so what?

What does multimodal mean?

A device which is operated through 2 or more methods, such as voice and touch.

The most popular example of a multimodal device is the Echo Show. Others include Google Nest Hub and Lenovo Smart Display.

Most people think multimodal means screen-based device, like the examples above. However, multimodal also includes our mobile-hosted assistants. As Vixen Labs co-founder JP tweeted:

…we often skip over voice on mobile which has been around for way longer. Folks forget how GOOD (yes good!) Siri and Google Assistant have gotten

Another term which can be used interchangeably with multimodal is voice-first. Smart speakers are built to have voice as their primary input method.

What does multimodal look like in practice?

A user interacts with a multimodal device by touching the screen, speaking to the assistant behind it, or both.

Feedback is then given visually or audibly, depending on the user input.

Certain data suits particular interactions and responses better. For example, it’s easier for a user to understand what a product looks like by seeing it, rather than Alexa describing it. On the other hand, it’s much simpler for a user to issue a voice command than to touch-scroll through long menus with different options or type out answers into an extensive form.

What are the opportunities presented by multimodal?

  1. usability

We learned to talk before we learned to type, but we learned to interact with technology by touching. Devices able to combine these inputs provide us with a much more holistic experience, which utilises more than one of our senses — and, crucially, connects them together.

2. brand identity

So much of a brand is its visual identity. A device able to show the Instagram-driven consumers of today exactly what content they’re consuming (or which products they can buy) helps to elevate this brand awareness.

How will your organisation make the most of multimodal?

The power of the multimodal experience hasn’t been missed by Amazon. As Brett Kinsella wrote for voicebot.ai, “Amazon’s biggest weakness today is its mobile strategy… there are few people using the Alexa app while on-the-go”.

This is precisely why the updated smart speaker and new wearable announcements were so exciting.

… because when Voice is accessible in different ways, “we can begin to do much more than help with hands-free tasks and accessibility. We can go beyond novelty into true utility.” (JP again.)


Thinking of bringing your business into the Voice First landscape? Talk to us about our design services and workshops.

Vixen Labs

News and perspectives from the Vixen Labs team. Voice First Strategy, Experience and Marketing specialists.

Thanks to Jen Heape

Vixen Labs Team

Written by

Europe’s thought leaders on Voice First technology. Content from James Poulter, Jen Heape, & Romina Pankoke.

Vixen Labs

News and perspectives from the Vixen Labs team. Voice First Strategy, Experience and Marketing specialists.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade