Bringing Characters to Life: Combining the Power of Inworld AI and ElevenLabs
In thinking about the future of creative expression, there is so much opportunity to combine cutting-edge AI technology with artistic imagination to create captivating and immersive experiences. Imagine if your characters could step out of the pages of a book and converse with you in real-time, or if the everyday things around you engaged in thoughtful conversation.
In this article, I’ll guide you through my journey, showcasing how I fused custom characters with the power of AI to bring to life a new nature friend through Inworld and ElevenLabs expressive and human-like AI technologies.
Meet the Technologies: Inworld AI and ElevenLabs
Inworld AI: Inworld AI is an AI-powered platform that allows developers to create interactive and engaging conversations with AI models. It enables real-time, context-aware interactions that feel remarkably natural, pushing the boundaries of what’s possible in conversational AI.
ElevenLabs: ElevenLabs provides sophisticated speech synthesis capabilities and a voice lab environment that empowers developers to generate lifelike speech from text. This technology enables the transformation of AI-generated responses into expressive, human-like speech.
The Journey: Bridging AI and Creativity
As I was envisioning the type of project I wanted to work on, I found myself drawn to the idea of bringing the potential characters around me to life through interactive conversations. The challenge? To make these characters not just responsive, but imbued with distinct personalities matching what I imagined them to be if they were actually real. This is where the magic of Inworld and ElevenLabs came into play.
My approach involved a three-step process: capturing audio from my microphone, sending this audio to Inworld to prompt an AI response, and finally, transforming Inworld’s text-based response into spoken words using ElevenLabs.
Code Walkthrough: Unveiling the Magic
Let’s take a closer look at the code that brought my characters to life. It’s important to note that prior to the actual coding, I developed a character and a scene in Inworld, as well as the perfect voice to accompany it in ElevenLabs.
Back to the code. First, I created the Inworld client. We need to provide all of the details that we want Inworld to consume ahead of time, such as our API key and secret, any configuration values, the scene that our character is set in, and what we expect it to do whenever we receive the response packet back. Afterwards, we’ll build the client, and finally, open the connection so that we can have a back and forth conversation with our character.
Now, for the audio bits. I captured audio input from the microphone using two NPM packages, fs and node-mic-record. Once the recording is stopped, I sent this audio as a prompt to Inworld’s API, and ultimately, my character. You’ll notice that I’m using two techniques recommended by Inworld here, chunking, as well as a silence stream at the end of my audio file. These both allow Inworld to better ingest the audio data and provide an accurate response. Following the API call, the response generated by Inworld is a textual representation of my character’s dialogue, ready to be transformed with the unique voice magic of ElevenLabs.
With the AI-generated response in hand, I was able to open up a text to speech stream with ElevenLabs via the elevenlabs-node package. Using my API key, ElevenLabs Voice ID, and the text obtained from Inworld, I received an audio file of the prompt response delivered in my character’s unique voice. Through the fs and play-sound packages, the audio was then played back to create the full lifelike conversation loop with my character.
Read also “Mastering Inworld.ai on the Web: Technical Tips for Seamless Integration” to learn more.
Learning and Iteration: Refining the Experience
Throughout the development process, I discovered that achieving a natural flow of conversation required continuous refinement of both the code and the characters. The key was to strike a balance between the AI’s responsiveness and the character’s personality. In the future, I’d like to add a queuing system to make the interactions and timing even more seamless.
Conclusion: Breathing Life into Fiction
The journey of bringing characters to life through Inworld AI and ElevenLabs opened so many potential creative doors for me. I felt like I was able to transcend the limits of traditional storytelling, enabling a new generation of characters to communicate dynamically and authentically.
As you embark on your own AI explorations, consider the potential of AI-powered storytelling and immersive AI experiences. Inworld and ElevenLabs are just the beginning, and their integration allows for so many new dimensions of artistic expression and the ability to give voice to characters in ways previously thought impossible.
I hope you’ll use my code as a template and merge it with your own imagination, allowing your own household characters to come alive through the power of AI-generated words and voice.
Happy creating!
