Voice Tech Global
Published in

Voice Tech Global

From Oprah to Michael Jordan: give your brand a unique voice on smart speakers

Four ways you can make your voice app sound like a real person

Photo source: Getty Images

2019 kicked off with some mind blowing voice stats:

  • Google Assistant, Cortana, Siri and Amazon Alexa combined will be available on 2 billion devices by the end of the year. Mind that these devices are not limited to smart speakers as voice assistants are now available in cars, headphones, thermostats and even microwaves.
  • Google Assistant will support 30 languages.
  • 41% of adults and 55% of teens are using voice search daily.

This leaves no other choice for brands than to embrace voice and leverage this new technology to interface with their consumers.

Given that now customers can actually hear and talk to your brand, having a distinct brand voice (literally) is more important than ever. In this article we look into several ways you can personalize the way your brand sounds on smart speakers and other voice-first devices or even emulate a voice of a real person.

#1 Use voice recordings

If your app is comparatively simple and the number of interactions between the app and the user is limited and finite, you can record all the utterances with a voice actor. An example of this approach is the iHeartChristmas Alexa skill. As soon as you open the skill you won’t hear Alexa anymore — instead Santa Claus himself will help you pick your Christmas tunes.

On the plus side this does sound great, however sound production can be quite costly and not easily scalable as any new feature requires a sound production effort. The number of possible interactions with the voice app is limited to what has been recorded, so you need to be extra careful, when designing the app to cover all the edge cases.

#2 “Assembled” voices

StatMuse

Another way to create a personalized voice, which resembles that of a real person, is to use a vast library of recordings by a real person and then apply AI to “assemble” phrases out of those recordings.We can deduce that one product, which uses this approach is StatMuse.

StatMuse is an app, which allows users to ask for latest sports stats and receive a response by one of the star players.

To recreate the famous voices, StatMuse first worked with the sport stars to record a vast library of sample phrases. Then, when a user asks a question e.g. “Which team won the last World Series?”, StatMuse AI uses this library to create a matching response, which consists of the words and collocations fetched from the library.

User question: “Which team won the last World Series?”

Pre-recorded voice dataset by a sports star:
“The New York Yunkees”,“The Boston Red Sox”, “The Huston Astros”, “The Clevelend Indians”, “The Los Angeles Dodgers”, “won”, “lost”, “the ”, “to”, “in”, “the”, “a”, “on”, “at”, “one”, “three”, “four”, “ten”,“twenty”,”thirty”, “sixteen”, “seventeen”, “eighteen”, “nineteen”, “series”, “world”, “national”, “games”, “against” (…)

System response: “The Boston Red Sox won in the 2018 World Series against the Los Angeles Dodgers, 4 games to 1.”

With this approach you can have some flexibility as you won’t need to pre-record every single app response, however the system will only be able to generate answers to certain questions known in advance, e.g. if it’s a sport’s app you won’t be able to ask it “what’s the weather?”. Speaking about the resulting voice quality — you can definitely recognise the person, who is speaking, but you’ll also hear distinct breaks between pre-recorded sound bits, similar to the choppy sound when you are talking to a bank IVR. Check out for yourself here.

#3 Human-like synthetic voices

WaveNet

There are a few providers of human-like generated voices, you won’t be able to make these sound like a specific real person like Oprah, but if you could pick one distinct voice for your brand which will be different from the default smart speaker voice. Amazon Polly, Nuance or WaveNet are good examples.

To use a WaveNet voice you would need to pay around $16.00 USD / 1 million characters. However what you get in return is a complete flexibility in features, scalability (you can use this voice for all your voice apps) and an almost human sound quality (you can pick a male or female voice and accent too e.g. British or Australian). Check out some voice samples here.

#4 On-demand synthetic voices

Lyrebird

Finally, you can “clone” or develop a custom-made generated voice. This will be a voice like the one you get from WaveNet, but the model will be trained to sound like a certain real person of your choice (with their consent of course). This process isn’t automated yet, so building a voice like this will require a lot of engineering effort/investment for each new voice.

One startup, which can help you design a voice for your company is Lyrebird. This approach even though it can come out as pricy will allow you to get a unique brand voice, it can be any real person or celebrity, which is both scalable and sounds natural. Check out the resulting voices here.

If you liked this article, please support us with some claps and feel free to attend our Voice Tech TO Meetup events or join the conversation on Twitter, Slack or LinkedIn.

--

--

--

This is a blog about voice.

Recommended from Medium

08: Nathan Martin, Deeplocal CEO

The Bible of articles about UX (3th quarter 2020)

How you can Select A Great LatexBed https://t.co/ofzrMI5vbM

An extending process

3 Great Classic Movies with UX-Related Lessons

From Sketches to Responsive Design

DECO 7230 week 10 journal

7 tips for writing more compelling case studies

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Polina Cherkashyna

Polina Cherkashyna

Product strategist and thought leader focused on voice-first products, Machine Learning and Product Delivery craftsmanship. Organizer of Voice Tech Global.

More from Medium

Let’s Catch-22 — A year of paradoxes

Catch-22

Teamwork makes the dream work: Celebrating the edinno.lab coach, instructor, and expert communities

MA Thesis: Modular Multi-IMU Aided Inertial Navigation framework

5th Annual AppExchange Partners Report Shows What’s Driving 10 Million App Installs