Deep Fakes For All: The Proliferation of AI Voice Cloning

Mike Todasco
3 min readAug 18, 2023

--

Generated on Midjourney on August 17, 2023 by Michael Todasco from the prompt “black and white photograph of a large group of white male Charlie Chaplin impersonators — ar 3:2 — s 250”

Last week we hit a major milestone in AI voice cloning. Using Play.HT’s new 2.0 model anyone can create a voice clone with just 30 seconds of training data. Rewind five months, and I was marveling at the realism they achieved with a 30 minute sample. At this pace, by the end of the year, we could have your cloned voice perform The Canterbury Tales, trained on a single cough.

Voice cloning uses Machine Learning/AI algorithms to analyze patterns and ultimately replicate a person’s voice. By training on a sample of an individual’s speech, these algorithms can generate a synthetic voice that closely mimics the original speaker’s tone, pitch, accent, and speaking style.

While Cough.AI may not yet be a reality, we can use this new Play.HT model now. (It is free to sign up and try it.) So how good is it? I trained the model on Charlie Chaplin’s final speech from The Great Dictator. (I felt this fitting as Chaplin was a man quite famous for not talking.)

And from that, I asked ChatGPT what tongue twister I should have Charlie Chaplin say. It returned:

Charlie Chaplin chatted cheerfully while chewing chunky chocolate in a chintzy chair, choosing charming Chaplin-esque chortles to charm chirping children in the chilly chapel.

So I asked the model to mimic Charlie Chaplin saying that phrase, and this was the output:

This technology is getting scary. With such a small voice sample, it did a truly admirable job of mimicking Chaplin’s voice. What does that mean for the rest of us, given that recording devices are ubiquitous? No matter how private you may think you are if you are spending time online, your voice imprints are everywhere.

To be clear, I wasn’t supposed to do what I did. I broke Play.HT’s user agreement when I uploaded an audio clip that I didn’t have the rights to. (I was doing it for educational purposes and subsequently deleted the Chaplin model.) But soon, models of this stature will be open-sourced and readily available for anyone to use for any purpose. That is the reality of the technical progression we are following. Cloning won’t just be used for making samples of dead actors. It could be used to make a voice clone of your boss, your grandkids, and of course, of you.

If anyone can get a sample of your voice, they will soon be able to have you say whatever they please. We used to need a lot of training data to make a deep fake. Back in 2017- state of the art deep fake technology could only be done on people like world leaders who had huge training data sets.

In 2023 we can now make a deep fake audio about anyone. This is the beginning of the long tail of deep fakes.

--

--

Mike Todasco

Visiting Fellow at the James Silberrad Brown Center for Artificial Intelligence at SDSU, AI Writer/Advisor