RealTalk: This Speech Synthesis Model Our Engineers Built Recreates a Human Voice Perfectly

May 15, 2019 · 4 min read
…and it’s the voice of Joe Rogan. Disclaimer: Joe didn’t actually endorse our work like this. (Source: a screenshot from the YouTube video the RealTalk team created for the project.)

Today we’re excited to announce that three Machine Learning Engineers at Dessa (Hashiam Kadhim, Rayhane Mama, Joseph Palermo) have produced the most realistic AI simulation of a voice we’ve heard to date.

It’s the voice of someone you’ve probably heard of before: Joe Rogan. (For those who haven’t—Joe Rogan is the creator and host one of the world’s most popular podcasts, which to date has nearly 1300 episodes and counting.)

Obviously, something like this has to be heard to be believed. So without further ado, check it out for yourself:

Remember: 100% of the following audio was generated from the machine learning model using only text input. This includes the breaths, ‘um’s and ‘ah’s, and all other noises.

The replica of Rogan’s voice the team created was produced using a text-to-speech deep learning system they developed called RealTalk, which generates life-like speech using only text inputs.

Crazy, right? If you’re like us, and specifically, like our Principal ML Architect, Alex Krizhevsky, you’re probably thinking that it’s “one of the most impressive things I’ve seen yet in artificial intelligence.” Alex also noted that the work suggests that “Human-like speech synthesis is soon going to be a reality everywhere.”

What Does This Mean? Considering Societal Impact

As AI practitioners building real-world applications, we’re especially cognizant of the fact that we need to be talking about the implications of this.

Because clearly, the societal implications for technologies like speech synthesis are massive. And the implications will affect everyone. Poor consumers and rich consumers. Enterprises and governments.

Right now, technical expertise, ingenuity, computing power and data are required to make models like RealTalk perform well. So not just anyone can go out and do it. But in the next few years (or even sooner), we’ll see the technology advance to the point where only a few seconds of audio are needed to create a life-like replica of anyone’s voice on the planet.

It’s pretty f*cking scary.

Here are some examples of what might happen if the technology got into the wrong hands:

  • Spam callers impersonating your mother or spouse to obtain personal information
  • Impersonating someone for the purposes of bullying or harassment
  • Gaining entrance to high security clearance areas by impersonating a government official
  • An ‘audio deepfake’ of a politician being used to manipulate election results or cause a social uprising

Obviously, though, not everything is doom and gloom. There are also some really good things that could come out of speech synthesis models. Here are some examples:

  • Talking to a voice assistant in a way that feels as natural as talking to a friend
  • Customized voice applications — for instance, a workout app that contains a personalized pre-workout pep talk from Arnold Schwarzenegger
  • Improved accessibility options for people that communicate through text-to-speech devices, for example, people with Lou Gehrig’s disease
  • Automating voice dubbing for any media and in any language

As the recent report “The Malicious Uses of Artificial Intelligence” by Oxford’s Future of Humanity Institute notes, new advancements in artificial intelligence not only expand existing threats, but also create new ones. (We highly recommend checking out the report, which is freely available to download here.)

We won’t pretend to have all the answers about how to build this technology ethically. That said, we think it will be inevitably built and increasingly implemented into our world over the coming years. So in addition to raising awareness and acknowledging these issues, we also want to show this work as a way of starting a conversation on speech synthesis that must be had.

Everyone should know what kinds of things are possible with the development of speech synthesis technologies. As we’ve seen with deepfakes, public awareness and dialogue also pushes governments, policymakers and lawmakers to take action and develop countermeasures swiftly.

A crucial advantage and responsibility we have as an applied AI company is knowing that there’s a huge difference between exploring AI in research and implementing it into the real world. To work on things like this responsibly, we think the public should first be made aware of the implications that speech synthesis models present before releasing anything open source.

Because of this, at this time we will not be releasing our research, model or datasets publicly.

Update: When we first published this article in May we promised a technical overview of the model and data by way of another blog post on RealTalk. That post is now available here.

Next steps

So pay attention! Join the conversation! Write to some relevant government officials! Knowledge is power, and we encourage individuals, companies and governments to think about how we can responsibly implement these technologies into our society.

Learn more about RealTalk: For anyone who has questions, feedback or inquires about the project, connect with us by email at

Curious about how RealTalk was built? Check out Pt. II of the blog post here for a technical overview of the text-to-speech synthesis model, data, and more.

We also encourage you to check out a Turing Test-style game the RealTalk team built to showcase the naturalness and intelligibility of this model, which can be found at

Please note that this project does not suggest that we endorse the views and opinions of Joe Rogan. Joe was selected as a demonstrative model for the purposes of displaying the capability of this technology.

Dessa News

The latest from a (not-so) secret AI lab at Square.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store