The risks and opportunities of speech generation after WaveNet

Andreas Kirsch
BlackHC
Published in
10 min readFeb 13, 2018

--

Before WaveNet, speech generation was suffering in the uncanny valley of “good enough to be understood but not good enough to sound pleasant” for a long time. Everyone is familiar with the robotic voices of parametric speech generation of old. WaveNet has changed all this. First published in a research paper by DeepMind in 2016, it was launched in Google Assistant in September 2017.

When Google Assistant replies to you, it uses a voice generated by WaveNet. The generated speech sounds much more pleasant now, and it has become harder to distinguish it from a real human voice. As resource costs decrease and the technology behind WaveNet advances, this new technology will pose both risks and offer opportunities for businesses and our society alike.

This article covers predictions for the future, opportunities, and risks (in this order).

What will the future bring?

Given the current state of the art, we can think about the advances of the next 1–2 years. (Since I’ve begun working on this article, the future has already arrived in some ways — see further remarks below.)

More than just words

First, we can expect research in speech generation to take into account more than just the words…

--

--

Andreas Kirsch
BlackHC

DPhil student at AIMS in Oxford; former RE at DeepMind, former SWE at Google; fellow at Newspeak House.