MusicLM and the Future of AI Music

Western AI
WAI’s Wavelength Newsletter
3 min readFeb 6, 2023

by Alex Hu

Forget about waiting years for a new music album release! Google has released new AI technology that can generate music through text prompts. In a similar manner to ChatGPT, MusicLM is able to create various instrumental sounds and human voices in different genres, creating more and more complex compositions based on the detail of the input. The program can also be directed with visuals and text, and this feature has been used to musically interpret the atmosphere of art masterpieces such as Edvard Munch’s “The Scream” or Salvador Dali’s “The Persistence of Memory.” A full list of MusicLM’s samples can be found here.

However, Google has not yet released MusicLM to the public and only samples are available to explore on their website. This is because anything artistic that is generated by AI can run into many legal copyright issues as well as ethical concerns. For example, MusicLM tends to incorporate copyrighted material into the generated music, with around 1% of produced material being directly copied from the 280,000 hours of music used to train the program. Many legal experts have argued that the very premise of programs like MusicLM violate the United States Copyright Act’s reproduction right, which raises questions about ownership and rights to publicity. Going forward, MusicLM needs to reduce the frequency of direct imitation and understand the legal implications of AI music before the program can be released publicly.

MusicLM casts music generation as a hierarchical sequence-to-sequence modeling task, which means that the program relies on other models to do certain tasks, creating a chain of complex, interdependent models like a Matryoshka doll.

This process essentially digests the text prompt for the model, understands it, and generates music from it (similar to how text-to-image programs such as DALL-E 2 function). MusicLM generates three types of tokens using pre-trained models: MuLan, w2v-BERT, and Soundstream. MusicLM also uses a variety of samples in their training data, including both unlabelled music and thousands of clips labeled by professional musicians. As a result, MusicLM’s results have surpassed previous programs such as Mubert or Riffusion in terms of accuracy and audio quality.

So far, MusicLM’s capabilities have taken the musical and AI community by storm. Although AI generated music is not new, MusicLM is able to create convincing interpretations that sound authentic. However, the program is not perfect and seems to have trouble generating vocals. The samples that feature voices sound nonsensical, raw, and at times, even chilling. Therefore, MusicLM is far from being a replacement for musicians. Nonetheless, MusicLM is definitely a huge step forward in music AI development. Further research and regulation is necessary before this technology can be widespread but it holds great potential to innovate and transform the way we understand, consume, and create music in the future.

--

--