I Don’t Speak Chinese, But My ObEN Digital Doppelganger Does

PCMag
PC Magazine

--

ObEN wants to transform voice personalization using AI and speech synthesis. I was blown away by my demo.

By Sophia Stuart

I just received an audio clip, pressed play, and recognized my voice — but I was speaking Chinese. In reality, I don’t know more than a cheerful “nihao” and a suitably grateful “xie xie.” But it was definitely my voice, and the company responsible is ObEN.

ObEN, part of long-running technology incubator Idealab, wants to transform voice personalization using AI and speech synthesis. The company, founded in 2014 by Nikhil Jain and Yi (Adam) Zheng, holds several patents with 10 more pending, and counts renowned experts like USC’s Dr. Kevin Knight and UCLA’s Dr. Abeer Alwan as advisors.

It just nabbed funding from HTC’s Vive X, a global accelerator aimed at funding promising VR start-ups, with an eye on allowing game developers to have their virtual characters sing or talk in a voice of their choice.

ObEN co-founders Yi (Adam) Zheng and Nikhil Jain

“I had spent five years at health care provider Kaiser Permanente and joined Idealab to devise new consumer-facing disruptive health tech,” Jain told PCMag. “Adam and I started working together when he came to Idealab from China, where he’d been an advisor to China Telecom’s video operations center. We found we had very complementary skillsets, both steeped in engineering, but I’m more product-focused and he has a strong financial background. So we started brainstorming concepts.”

Ultimately, the duo landed on a very personal pursuit. Jain read bedtime stories to his children but his hectic work and travel schedule meant he couldn’t continue, even via Skype. He wished he could “leave his voice behind” to read to his children. In turn, Zheng longed for a way to hear his young daughter’s voice whenever he wanted. The concept for ObEN was born.

To be clear, it’s not speech synthesis as we’ve known it. ObEN captures the full Phoneme spectrum as opposed to recording individual syllables, which are language specific.

For example, when I grew up in the South of England there was a very plummy lady voice announcer for British Rail. It was clear the train network had used what’s known as “unit selection method” to record the name of each individual train station, the hour, and other words like “arrival” and “departure.” Then, through concatenation — or stringing the words into a chain — had her say things like, “The train on Platform 1 is for London Victoria and will depart at three fifteen precisely.”

It was very stilted because recorded words were just popped together.

“What we do is voiceprint not just the tone, but the performance style of the speaker,” Jain explained. “Then we have an output of particles, apply deep learning to speech synthesis, and can create a digital adaptation of that voiceprint — in any language — and not just speech, but singing, too. You don’t need a special recording studio to use ObEN’s technology, just a quiet place with not much background noise.”

ObEN robot

One recent successful application of ObEN’s tech was Ben, a robot concierge in Las Vegas. Through Zheng’s connections in China, Tencent, which owns massive social network WeChat, introduced ObEN to Sixiao Guo in Las Vegas, who headed up international marketing at Caesars Entertainment.

“The Chinese market is very important for Las Vegas,” confirmed Guo. “So, during planning for CES, we wanted an innovative way to attract travelers and provide great customer service at the LINQ hotel, part of Caesars Entertainment. In partnership with WeChat and ObEN we created Ben, a robot concierge, using a voice actor who gave us the voiceprint and hardware from HRG [Hit Robot Group]. During CES, Ben spoke mostly English but in February, for Chinese New Year, he delighted hotel visitors by speaking Mandarin, making recommendations for nightlife, entertainment, and dining options.”

“What we were trying to achieve was to prove people really care for, and respond to, human voices, as opposed to machine voiced chat bots, for example,” Jain said. “All the feedback we got in Las Vegas confirmed that. They really connected with Ben.”

Guo, meanwhile, liked the ObEN tech so much, that she recently jumped ship and moved to Los Angeles to join the company.

Pleasantly spoken tourism trade robots aside, a growing part of ObEN’s future business will be in personable devices for seniors and children in the healthcare industry, according to the co-founders. And maybe a few celebrities?

“We get contacted by a lot of celebrities’ estates,” said Jain. “Because, apparently, they can make more money out of a celebrity post-death. So they want to create a digital identity, right down to the voiceprint, for posterity. The celebrity asks us, ‘Which recording studio do you want to book me into?’ and they’re often disappointed when we say, ‘Oh, you can just record our two-minute, 36-sentence input test from somewhere quiet like your bathroom.’”

So that’s what I did. Soon, I received an audio file of me speaking Chinese, which blew my mind.

Back when I had a fancy executive job, I used to fly to Beijing and Hong Kong on business. Despite a willingness to pick up phrases to smooth interpersonal relations with colleagues upon arrival, it was often an unsatisfying mix of interpreters and confused signals, coupled with awful jetlag and misunderstandings.

In the future, business will be rather different. Because I might not speak Chinese (or Russian, Japanese, Dothraki, and so on), but my digital doppelganger does, thanks to ObEN.

And, not to be morbid, but she will continue to speak in many languages long after my corporeal presence on this planet has expired. So if anyone would like my digital doppelganger to audition for the multilingual announcer gig on an interplanetary spacecraft, when that day comes, I would be delighted to do so.

Sign up on ObEN’s website to be notified when it’s more more widely available.

--

--