How to run Mimic 3, an open source text to speech AI model on Windows 11
With a simple demo
Mimic3 is a powerful multilingual text-to-speech AI model. You can generate speech in many different types of voices and languages. At the time of writing this article, Mimic 3 is not natively supported on Windows. Hence, we have to use Windows Subsystem for Linux (WSL) to run it.
You can refer to this article to understand how to set up WSL on your system. Although it is about running Apache Kafka, the steps related to WSL2 installation are precisely the same. You can follow them till the installing Java step. Don’t forget to restart the system once you are done.
Create a projects folder where you want to make a copy of the Mimic 3 codebase. Once you have installed WSL, you can access the Linux terminal from the start menu search bar. Open it. Go to the projects folder path in the terminal using the cd command.
All the below commands must be run in sequence in the terminal.
- Update all the package information in Linux
- Install pip, the python package manager
- Install venv, a library to create a virtual environment where you can run Mimic3.
sudo apt-get update
sudo apt install python3-pip
sudo apt install python3-venv
4. Clone the repository of Mimic 3 from Github. (Install git if it’s not installed already).
git clone https://github.com/MycroftAI/mimic3
5. Go into the repository folder in the terminal.
6. Run the below command to install Mimic 3.
7. A new virtual environment called
.venv will be created automatically, and Mimic 3 will be installed in the same, along with all dependent libraries. Activate this virtual env (if it is not automatically activated) using the below command. You will notice (.venv) appear on the left side of your username in the terminal, indicating it’s active.
8. Run the below command to start the Mimic3 web server. (Note that this is only one of the many ways you can use Mimic 3)
9. Visit http://localhost:59125/ to access and test the model.
Mimic 3 has many different options. One good voice model to test is hifi-tts_low with speaker 92 for the English language. Under the advanced settings, you can modify the following parameters.
After some experimentation, I felt that using 1.0, 0.5, and 0.8, respectively, for each of the settings gives a well-paced result (Just my take). Another nice-sounding voice model is “ljspeech_low,” which I used for the content I created below.
Note that Mimic 3 is compatible with Speech Synthesis Markup Language. However, I do not recommend using it as there seems to be some quality drop and inconsistencies at times. (Or maybe I haven’t yet figured out how to use it exactly)
One of the motivations for exploring text-to-speech for me was to create an AI news anchor who could help me curate the most pressing news content out there.
I have created a script on the latest news in the field of AI and made Mimic 3 say it.
Check out the final result here.
XQ Builds on Instagram: "Conceptualizing a new series to keep you updated on the latest AI news and…
2 Likes, 0 Comments - XQ Builds (@xqbuilds) on Instagram: "Conceptualizing a new series to keep you updated on the…
Also, follow me on Instagram (@xqbuilds) if you wish to see more such fun experiments while staying updated on the latest breakthroughs in tech.
Imagine the possibility of writing an algorithm to fetch the most trending news on a topic from the internet, make ChatGPT (or any LLM) write a news script with it, and use an AI anchor to record the audio for the same.
Mimic3 is multilingual, meaning you can generate speech in multiple languages with auto-translation of the content using some service.
Within seconds. Unlimited efficiency in creating and curating content.
That’s what I want to explore.