ElevenLabs AI Voice Cloning is the Future

11Labs — Indistinguishable from Magic ✨

Published in

Sopmac Labs

5 min readMar 7, 2023

Synthetic voices have come a long way since their inception, evolving from basic robotic tones to incredibly realistic and personalized voices. With the integration of artificial intelligence and machine learning techniques, the latest synthetic voices are becoming almost indistinguishable from natural speech.

As the technology continues to evolve, we can expect even more human-like synthetic voices in the future. Well, …the future has arrived.

Voice Cloning

Incredibly, with just a 60-second sample of my voice, ElevenLabs has managed to produce audio that has left me completely…speechless.

I’m amazed by the level of accuracy and realism in the replicated voice. It’s nearly identical to my own, with natural-sounding pauses and inflections that are truly remarkable.

ElevenLabs generated the following audio based on some famous text examples using my cloned voice.

How Sway? (Kanye Rant)

They Know Nothing! (Jim Cramer Rant)

Philip K. Dick Speech (1977 — Metz, France)

Voice Design (with the Boston Accent)

I have been on a 5-year side quest to generate a realistic Boston Accent with speech synthesis from Amazon Polly, Alexa, and Siri. Compared to these, ElevenLabs represents an exponential leap forward.

Instead of taking my word for it, let’s just listen to the synthetic Boston Accent that ElevenLabs generates:

Lobstahs

Shellfish

Oystah

Not only do you have the option to create audio using the VoiceLab through the UI, but you can also code against the ElevenLabs API.

ElevenLabs API

ElevenLabs API has over 20 endpoints that allow you to programmatically retrieve past speech synthesis audio or create new text-to-speech audio using your custom voices from the VoiceLab.

GET /v1/voices

curl -X 'GET' \
  'https://api.elevenlabs.io/v1/voices' \
  -H 'accept: application/json' \
  -H 'xi-api-key: YOUR_ELEVENLABS_API_KEY'

Returns a voice array, earch array item contains a voice_id

POST /v1/text-to-speech/{voice_id}

curl -X 'POST' \
  'https://api.elevenlabs.io/v1/text-to-speech/v01c31D' \
  -H 'accept: audio/mpeg' \
  -H 'xi-api-key: YOUR_ELEVENLABS_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
  "text": "testing",
  "voice_settings": {
    "stability": 0,
    "similarity_boost": 0
  }
}'

Converts text into speech for a given voice_id and returns audio.

GET /v1/history

curl -X 'GET' \
  'https://api.elevenlabs.io/v1/history' \
  -H 'accept: application/json' \
  -H 'xi-api-key: YOUR_ELEVENLABS_API_KEY'

Returns a history array, earch array item contains a history_item_id

GET /v1/history/{history_item_id}/audio

curl -X 'GET' \
  'https://api.elevenlabs.io/v1/history/h15t0ry1t3m1D/audio' \
  -H 'accept: audio/mpeg' \
  -H 'xi-api-key: YOUR_ELEVENLABS_API_KEY'

Returns the audio for a given history_item_id

Pricing Plans

All plans include API access, with pricing primarily based on character quota for speech synthesis.

FREE: 10k characters per month*
Starter: 30k characters per month
Creator: 100k characters per month

*Instant Voice Cloning is NOT available on the Free Plan due to abuse concerns.

There are also pricier plans available for the Independent Publisher, Growing Business, and Enterprise.

Conclusion

AI is fire (🤖 is 🔥).

As with caveman times, the fire can be used to cook your food and help you multiply or it can burn you.

Cook

Voice cloning has the potential to revolutionize the way we communicate and interact with technology. With the ability to create realistic and personalized voices, voice cloning technology can enhance various industries such as entertainment, customer service, healthcare, podcasting, and to extend the memory of loved ones. It allows for a more efficient and effective exchange of information while also providing a more engaging and human-like experience for users. As the technology advances and becomes more accessible, we can expect to see more innovative applications of voice cloning that will further transform the way we interact with our devices and each other.

…or be Cooked

Voice cloning can be scary because it has the potential to be used for malicious purposes such as fraud, identity theft, and misinformation. With the ability to create a voice that sounds just like someone else, it could be used to impersonate individuals, manipulate audio recordings, or spread fake news. It could also be used to create deepfakes, which are highly realistic videos or audio recordings that show someone saying or doing something that they didn’t actually say or do. This could lead to serious consequences such as reputational damage, loss of privacy, or even harm to individuals or society as a whole.

As with any emerging technology, there is always a risk of misuse, and it is important to develop ethical guidelines and regulations to ensure that voice cloning is used responsibly and for the greater good.

Parting thought: where are we headed with iPhones, Airpods, ChatGPT, and ElevenLabs?…

Resources

How to Turn Siri into a Boston Native with ChatGPT

A Step-by-Step Guide to Adding a Boston Accent to Siri Using Shortcuts & the OpenAI API

medium.com

Teaching Alexa to speak with a Boston accent

Amazon Polly Demo

medium.com

The Era of Voice: From Keyboards to Vocal Cords

By 2018, 30% of our interactions with technology will be through “conversations” with smart machines. Product leaders…

medium.com

ElevenLabs || Prime Voice AI

Edit description

beta.elevenlabs.io

ElevenLabs API Documentation

Edit description

api.elevenlabs.io

ElevenLabs AI Voice Cloning is the Future

11Labs — Indistinguishable from Magic ✨

Voice Cloning

How Sway? (Kanye Rant)

They Know Nothing! (Jim Cramer Rant)

Philip K. Dick Speech (1977 — Metz, France)

Voice Design (with the Boston Accent)

Lobstahs

Shellfish

Oystah

ElevenLabs API

GET /v1/voices

POST /v1/text-to-speech/{voice_id}

GET /v1/history

GET /v1/history/{history_item_id}/audio

Pricing Plans

Conclusion

Cook

…or be Cooked

Resources

How to Turn Siri into a Boston Native with ChatGPT

A Step-by-Step Guide to Adding a Boston Accent to Siri Using Shortcuts & the OpenAI API

Teaching Alexa to speak with a Boston accent

Amazon Polly Demo

The Era of Voice: From Keyboards to Vocal Cords

By 2018, 30% of our interactions with technology will be through “conversations” with smart machines. Product leaders…

ElevenLabs || Prime Voice AI

Edit description

ElevenLabs API Documentation

Edit description

Written by Ivan Campos