From Oprah to Michael Jordan: give your brand a unique voice on smart speakers
Four ways you can make your voice app sound like a real person
2019 kicked off with some mind blowing voice stats:
- Google Assistant, Cortana, Siri and Amazon Alexa combined will be available on 2 billion devices by the end of the year. Mind that these devices are not limited to smart speakers as voice assistants are now available in cars, headphones, thermostats and even microwaves.
- Google Assistant will support 30 languages.
- 41% of adults and 55% of teens are using voice search daily.
This leaves no other choice for brands than to embrace voice and leverage this new technology to interface with their consumers.
Given that now customers can actually hear and talk to your brand, having a distinct brand voice (literally) is more important than ever. In this article we look into several ways you can personalize the way your brand sounds on smart speakers and other voice-first devices or even emulate a voice of a real person.
#1 Use voice recordings
If your app is comparatively simple and the number of interactions between the app and the user is limited and finite, you can record all the utterances with a voice actor. An example of this approach is the iHeartChristmas Alexa skill. As soon as you open the skill you won’t hear Alexa anymore — instead Santa Claus himself will help you pick your Christmas tunes.
On the plus side this does sound great, however sound production can be quite costly and not easily scalable as any new feature requires a sound production effort. The number of possible interactions with the voice app is limited to what has been recorded, so you need to be extra careful, when designing the app to cover all the edge cases.
#2 “Assembled” voices
Another way to create a personalized voice, which resembles that of a real person, is to use a vast library of recordings by a real person and then apply AI to “assemble” phrases out of those recordings.We can deduce that one product, which uses this approach is StatMuse.
StatMuse is an app, which allows users to ask for latest sports stats and receive a response by one of the star players.
To recreate the famous voices, StatMuse first worked with the sport stars to record a vast library of sample phrases. Then, when a user asks a question e.g. “Which team won the last World Series?”, StatMuse AI uses this library to create a matching response, which consists of the words and collocations fetched from the library.
User question: “Which team won the last World Series?”
Pre-recorded voice dataset by a sports star:
“The New York Yunkees”,“The Boston Red Sox”, “The Huston Astros”, “The Clevelend Indians”, “The Los Angeles Dodgers”, “won”, “lost”, “the ”, “to”, “in”, “the”, “a”, “on”, “at”, “one”, “three”, “four”, “ten”,“twenty”,”thirty”, “sixteen”, “seventeen”, “eighteen”, “nineteen”, “series”, “world”, “national”, “games”, “against” (…)
System response: “The Boston Red Sox won in the 2018 World Series against the Los Angeles Dodgers, 4 games to 1.”
With this approach you can have some flexibility as you won’t need to pre-record every single app response, however the system will only be able to generate answers to certain questions known in advance, e.g. if it’s a sport’s app you won’t be able to ask it “what’s the weather?”. Speaking about the resulting voice quality — you can definitely recognise the person, who is speaking, but you’ll also hear distinct breaks between pre-recorded sound bits, similar to the choppy sound when you are talking to a bank IVR. Check out for yourself here.
#3 Human-like synthetic voices
There are a few providers of human-like generated voices, you won’t be able to make these sound like a specific real person like Oprah, but if you could pick one distinct voice for your brand which will be different from the default smart speaker voice. Amazon Polly, Nuance or WaveNet are good examples.
To use a WaveNet voice you would need to pay around $16.00 USD / 1 million characters. However what you get in return is a complete flexibility in features, scalability (you can use this voice for all your voice apps) and an almost human sound quality (you can pick a male or female voice and accent too e.g. British or Australian). Check out some voice samples here.
#4 On-demand synthetic voices
Finally, you can “clone” or develop a custom-made generated voice. This will be a voice like the one you get from WaveNet, but the model will be trained to sound like a certain real person of your choice (with their consent of course). This process isn’t automated yet, so building a voice like this will require a lot of engineering effort/investment for each new voice.
One startup, which can help you design a voice for your company is Lyrebird. This approach even though it can come out as pricy will allow you to get a unique brand voice, it can be any real person or celebrity, which is both scalable and sounds natural. Check out the resulting voices here.