Twitch plays Melodrive — Using comments to play AI-generated music

Paul Dachs
The Sound of AI
Published in
7 min readMay 30, 2019

How we made an infinite stream of piano music that changes based on your comments — during a hackathon.

It was a cloudy Saturday during spring in Berlin. Two AI music startups, Melodrive and Dadabots, converged in the center of a yellow-walled co-working office — the base of the Twitch plays Melodrive hackathon for the next day. The goal was clear, but not straightforward: set up an endless Twitch stream of AI-generated music that changes according to the comments in the chat. It was a novel idea; nothing like it had ever been built before, let alone during a hackathon. Specifically, the comments would change the music’s composition, style and mood to match an emotion detected in the words. Even if you typed, “my cat is a naughty rascal”, the AI would interpret what emotion that is and play music to match. (Cat might just deserve a special kind of emotion all by itself.)

Our assembled cast was multi-skilled. With coding, sound design and writing backgrounds in the room (and our teammate in Ireland a Skype call away), everyone was assigned a specific task to do. What’s unique about hackathons is the pace at which you’re forced to work. It’s a scramble; and if you want something to show for it, you need to spread the load. This often means dipping your toes into fields you know nothing about, or learning new skills on the fly. CJ Carr, from Dadabots, the startup behind the AI that generates infinite Death Metal, built the adaptive visualiser, which is something he’d never done before. We’d also never used Twitch (maybe a little unusual for a company working in the video-games industry, admittedly), so this was the first time interacting with its user interface (UI). Each of us gathered around the table to plot the course of action; the plan was scribbled across the whiteboard in glorious red and black marker.

We set to work. The usually-serene office was turned into an AI music workshop. Among a mesh of equipment, each of us toiled. The days in Berlin have begun to stretch as the warmer months unfold, but this one seemed to fly by. Gradually, each task was completed and struck off the list in a momentary celebration of victory.

Originating sound

Starting was the easy part. We already had the AI engine — Melodrive Indie. This system, built from the ground up over the last few years, composes an infinite stream of original, emotionally variable music in realtime. This makes it an adaptive AI music system; perfect for responding to thousands of comments on a Twitch stream. It’s able to play a range of styles and genres including house, ambient, rock and piano. We decided to go with piano this time, because our sound designer had some virtual instruments (VSTs) in his locker that he’d been waiting to whip out. He focused on building a range of presets that would enhance the production value of our stream. The task was challenging, and the laptop we nicknamed ‘The Beast’ (because of its incredible computing power) was on the receiving end of a continuous barrage of Danish curse-words. The initial issue was that the VSTs didn’t work properly, because of the difficulty of building the presets ‘outside’ the Python code (although the effects are built in C++, the presets for the mixer are coded in Python). This struggle was certainly worth pursuing, as we knew it would ramp up the quality of the music and create the wow-factor we couldn’t do without.

A look of satisfaction

We knew it would only make sense if the visuals were as ‘emotionally’ adaptive as the music. The generative visuals were built using Processing, an open-source programme for learning to code graphical content. To do this we integrated the Valence/Arousal Model of Emotion, which is used in many emotional music systems. Valence refers to the positivity or negativity of an emotion (including sadness, anger, joy, tenderness). Arousal refers to how intense the physical stimulation of the emotion is. Both emotions are plotted on a graph and divided into two dimensions. ‘Happy’ is a high valence, high arousal affective state, and ‘Stressed’ is a low valence high arousal state. The emotions are assigned values that are then fed to the algorithm which changes the code accordingly. This is the same model we use for Melodrive, to instruct the AI what to respond to, or how to adapt.

The Valence/Arousal model, depicting varying degrees of emotion.

The AI at play

To generate the music in Melodrive Indie, we use a hybrid AI approach. It’s basically a loop where game information is used as the input. This game information is turned into an emotional API, meaning information based on the Valence/Arousal model. As mentioned, that information reflects a mood, such as ‘happy’, and is fed to the engine. The engine then produces symbolic information, similar to MIDI or XML. This is then given to the performance and production systems within the engine, until the music is ultimately rendered in the final form you hear on our stream. This same approach is applied in order to generate the adaptive visuals. It’s as simple as taking the comments and assigning a value in terms of valence and arousal, then outputting those to generate new music based on a mood. To translate the comments into an emotion the AI can identify, we use a technique called sentiment analysis. This algorithm falls into the category of NLP (Neural Language Processing), and is a process where text is analysed to extract opinions within the text. It’s become popular as it has many practical applications, appearing in social media monitoring, brand monitoring and customer services. This technique is also applied to the visual generation, where the words entered influence what’s displayed on the screen.

Beginning, riddle and end

While the hackathon was fun for those involved, we wanted that same feeling to extend to the people ‘playing the music’. This is how we ended up with the article you’re hopefully reading right now. My task was to carefully retell what happened that day and produce an engaging story for Twitch and all of our platforms. The idea was to tie together everything with a central thread — which resulted in an ancient, generous wizarding composer who shall not be named. Part of this process was to craft a fun riddle to draw the audience in and reward them for taking the time to enjoy the experience we’d set up. If they cracked the riddle (we were looking for a singular word), they’d receive a fun reward which meaningfully altered their experience of the stream. Here’s what I strung together by the end of the day (try to solve it if you can; I’ve been sworn to secrecy).

“Our unnamed wizard prides himself on mysterious and difficult riddles. Solve what’s below and type the word into the comments, and you’ll receive a gift more glorious than a wizard’s beard.”

Most call me a hero, even say it by name,

I need to save her, then save them, by the end of this game.

I’ve once been to space, but I felt right at home,

Sometimes I can warp to a place that’s unknown.

Who am I?

The reason for the riddle was to engage listeners, speak to their sense of fantasy and reward them for their enthusiasm and interaction. This is partly why we went with Twitch (their audience mostly comprises video-gamers, many of which are fans of fantasy worlds). I chose to weave together a mini-story that would hopefully excite this audience, above and beyond the fact that the music could be changed by the power of their words.

“Are you fond of riddles?” Varys and Tyrion, from Game of Thrones.

If you’ve ever wanted to rapidly test and improve your skills by building something new, do a hackathon. You’ll have no choice but to think and act quickly, and what you collectively create will forever be bound to a memorable experience. It’s how we’ll happily spend a Saturday, because we’re constantly striving to further the field in the AI music space. People always want a say in what music they hear. With ‘Twitch plays Melodrive’, they’re the audience and the conductor. If you’d like to give it a whirl, visit the stream and type anything for our unnamed composer; even if it’s excessive cursing in Danish.

--

--