ElevenLabs AI Voice Cloning is the Future

11Labs — Indistinguishable from Magic ✨

Ivan Campos
Sopmac Labs
Published in
5 min readMar 7, 2023

--

Synthetic voices have come a long way since their inception, evolving from basic robotic tones to incredibly realistic and personalized voices. With the integration of artificial intelligence and machine learning techniques, the latest synthetic voices are becoming almost indistinguishable from natural speech.

As the technology continues to evolve, we can expect even more human-like synthetic voices in the future. Well, …the future has arrived.

Voice Cloning

Incredibly, with just a 60-second sample of my voice, ElevenLabs has managed to produce audio that has left me completely…speechless.

I’m amazed by the level of accuracy and realism in the replicated voice. It’s nearly identical to my own, with natural-sounding pauses and inflections that are truly remarkable.

ElevenLabs generated the following audio based on some famous text examples using my cloned voice.

How Sway? (Kanye Rant)

They Know Nothing! (Jim Cramer Rant)

Philip K. Dick Speech (1977 — Metz, France)

Voice Design (with the Boston Accent)

I have been on a 5-year side quest to generate a realistic Boston Accent with speech synthesis from Amazon Polly, Alexa, and Siri. Compared to these, ElevenLabs represents an exponential leap forward.

Instead of taking my word for it, let’s just listen to the synthetic Boston Accent that ElevenLabs generates:

Lobstahs

Shellfish

Oystah

Not only do you have the option to create audio using the VoiceLab through the UI, but you can also code against the ElevenLabs API.

ElevenLabs API

ElevenLabs API has over 20 endpoints that allow you to programmatically retrieve past speech synthesis audio or create new text-to-speech audio using your custom voices from the VoiceLab.

GET /v1/voices

curl -X 'GET' \
'https://api.elevenlabs.io/v1/voices' \
-H 'accept: application/json' \
-H 'xi-api-key: YOUR_ELEVENLABS_API_KEY'

Returns a voice array, earch array item contains a voice_id

POST /v1/text-to-speech/{voice_id}

curl -X 'POST' \
'https://api.elevenlabs.io/v1/text-to-speech/v01c31D' \
-H 'accept: audio/mpeg' \
-H 'xi-api-key: YOUR_ELEVENLABS_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"text": "testing",
"voice_settings": {
"stability": 0,
"similarity_boost": 0
}
}'

Converts text into speech for a given voice_id and returns audio.

GET /v1/history

curl -X 'GET' \
'https://api.elevenlabs.io/v1/history' \
-H 'accept: application/json' \
-H 'xi-api-key: YOUR_ELEVENLABS_API_KEY'

Returns a history array, earch array item contains a history_item_id

GET /v1/history/{history_item_id}/audio

curl -X 'GET' \
'https://api.elevenlabs.io/v1/history/h15t0ry1t3m1D/audio' \
-H 'accept: audio/mpeg' \
-H 'xi-api-key: YOUR_ELEVENLABS_API_KEY'

Returns the audio for a given history_item_id

Pricing Plans

All plans include API access, with pricing primarily based on character quota for speech synthesis.

  • FREE: 10k characters per month*
  • Starter: 30k characters per month
  • Creator: 100k characters per month

*Instant Voice Cloning is NOT available on the Free Plan due to abuse concerns.

There are also pricier plans available for the Independent Publisher, Growing Business, and Enterprise.

Conclusion

AI is fire (🤖 is 🔥).

As with caveman times, the fire can be used to cook your food and help you multiply or it can burn you.

Cook

Voice cloning has the potential to revolutionize the way we communicate and interact with technology. With the ability to create realistic and personalized voices, voice cloning technology can enhance various industries such as entertainment, customer service, healthcare, podcasting, and to extend the memory of loved ones. It allows for a more efficient and effective exchange of information while also providing a more engaging and human-like experience for users. As the technology advances and becomes more accessible, we can expect to see more innovative applications of voice cloning that will further transform the way we interact with our devices and each other.

…or be Cooked

Voice cloning can be scary because it has the potential to be used for malicious purposes such as fraud, identity theft, and misinformation. With the ability to create a voice that sounds just like someone else, it could be used to impersonate individuals, manipulate audio recordings, or spread fake news. It could also be used to create deepfakes, which are highly realistic videos or audio recordings that show someone saying or doing something that they didn’t actually say or do. This could lead to serious consequences such as reputational damage, loss of privacy, or even harm to individuals or society as a whole.

As with any emerging technology, there is always a risk of misuse, and it is important to develop ethical guidelines and regulations to ensure that voice cloning is used responsibly and for the greater good.

Parting thought: where are we headed with iPhones, Airpods, ChatGPT, and ElevenLabs?…

--

--

Ivan Campos
Sopmac Labs

Exploring the potential of AI to revolutionize the way we live and work. Join me in discovering the future of tech