“What do you (really) want from AI music generation?”

2 min readFeb 5, 2023

In a hundred years, it is possible that children will listen to music generated entirely by AI with no human involvement. Future children may form strong connections with this music as they grow up with it, and it may hold the same level of meaning for them as the music we listen to today holds for us. This may seem outdated to future generations or to me in the future, but I am concerned about this possibility. One of the primary purposes of music is to connect with other humans. Artists share their emotions and experience through music and create a bridge between themselves and listeners. Additionally, art and music have always been about pushing boundaries, challenging norms, and expressing creativity and individuality.

However, AI generators trained on normative metrics generally do not align with these values. The capitalist models that significantly drive advancements in AI music generation often prioritize financial metrics and market appeal, leading to a homogenization of music. This disregard for artistic expression and creativity threatens to turn AI into a standalone “music generator” that dictates our musical tastes, rather than a tool for enhancing human creativity. To prevent this dystopian outcome, I think that it is crucial that we strive to maintain the balance between using AI technology and preserving the essence of what makes music meaningful as an art form. This means putting artists at the center of AI music generation and ensuring that they have full control over the music creation process, so their artistic vision and desires are not lost in technological advancements.

The recent MusicLM model is an interesting case. MusicLM can generate music based on a text description and a melody provided by a user (not necessarily MIDI but sung, whistled, or played). It quickly provides examples that authors can draw inspiration from. I however still believe that it lacks control over the generation process. Prompt-based models like MusicLM, DALL-E, or ChatGPT can give the illusion of control through iterative processes. But they remain high-level black boxes that don’t provide low-level actionable controls. Currently, it is almost impossible to modify a specific part of an output, such as replacing one instrument with another in a generated symphony or modifying a few notes of a generated melody. With such models, authors express their ideas by finding the right prompt repeatedly. I think this is a terrible way to create. Besides being extremely frustrating, learning how to prompt-engineer each model is not a viable technical means nor an expressive one for an author. Conditioning based on other melodies, as proposed by MusicLM, and contextual conversational agents like ChatGPT are elements that can help increase control, but they remain rather high-level controls. There is still much work to be done. I don’t know what kind of output I want from AI music generation, whether it is an audio file, a score, or different tracks. I however know that I want human creators to have full control over AI music generation to ensure that the outputs align with their intents, artistic vision, and soul.

— Jean-Peïc Chou

“What do you (really) want from AI music generation?”

Written by JP