The Future Of Branding? Synthetic Voices That Sound Just Like Our Own
The Y Combinator-backed startup Voicery uses AI to develop bespoke, synthetic voices for brands.
--
The San Francisco-based startup Voicery is only a few months old, but CEO and cofounder Bobby Ullman says he’s already had hundreds of requests from companies that are interested in developing their own branded voices. That’s because Voicery offers something most companies probably didn’t know they needed even just five years ago: a customized digital voice that sounds like an actual human, not a computer.
Ullman is a computer scientist who formerly worked at Palantir, and his cofounder, CTO Andrew Gibiansky, has experience in machine learning and worked on speech recognition at the Chinese company Baidu. The duo, who are childhood friends, applied to Y Combinator with a similar idea and honed it into Voicery in the Silicon Valley accelerator program.
Unlike the canned voices you’re likely to hear on customer service calls today, Voicery’s AI-synthesized voices sound human enough to convey carefully designed emotions that can act as an extension of a company’s brand. As more of our interactions with companies shift away from the visual and toward the verbal–whether thanks to Echo and Google Home or automated customer service systems–the tone, quality, and cadence of a company’s voice is becoming the new face of the brand.
Speech’s Uncanny Valley
Voice can be a powerful branding device–think of the familiar Jack in the Box voice or the rumbling voice of Allstate’s Dennis Haysbert. Yet you’ve probably cringed at how awkward Alexa sounds when she tells a joke. That’s because it’s incredibly hard for synthetic voices–which mimic human speech–to convey believable emotion with their halting, robotic cadence. Most of these computerized voices use an older method of speech synthesis called the concatenative model, which entails a voice actor recording up to 200 hours of speech, all that speech getting digitally chopped up into small bits of sound, and finally reconstituting it into whatever you need it to say.