MIDI Mashups: Using Machine Learning to Generate Unique Musical Scores

Published in

The Startup

9 min readSep 14, 2020

AI Composers present ideas to their human partners. People can then take certain elements and work them into their own, uniquely human, compositions. These AI-assisted compositions are then used to further train AI Composers, creating a cycle that renews itself symbiotically. Progress is inexorable.

Image of the author’s work: Music visualization in virtual reality!

Motivation I wanted to sample tracks of my choosing and layer them using a machine learning framework. I sought to use machine learning to ‘mashup’ songs. Moreover, I wanted to output sheet music that musicians could fool around with.

If you’re looking for instant gratification: proceed directly to Go, don’t collect $200, and check out the corresponding Python Notebook, where you can play around and bring your very own AI Composer to life.

However, if you’re intrigued by the notion of an AI composer and are curious as to the project’s inspiration and intent, indulge yourself by reading forth!

By the end of this article, you will be able to channel your inner (AI) composer.

The AI Composer?

Some humans are living, breathing groove machines, musicians. I am by no means of this sort, however I am a devout supporter of live music. There is profound pleasure in observing groove machines in their element, becoming entranced by their beats, and humbled by their harmonies. Music is a transcendent art. Not only does it move us, it moves with us.

Music has evolved alongside humanity, it is inexorably linked to our world view and the technology we use to generate it. Advances in technology have dramatically restructured our musical landscape. Electronic gadgets have afforded musicians an ability to shape their sound with mind-boggling variety.

Musicians are sculptors of sound. Some do it spontaneously, like a jazz guitarist plucking an improvised solo with intoxicating brio. Others do it more methodically, such as a symphony orchestra, the sublime synchronicity with which the performers become united. Expressed as formalized sheet music or intricately engrained in mental structure, all music is in a sense, composed.

The electronic analogue of the performer has been around since the advent of the electromagnetic speaker. The electronic analogue of the composer, however, is less defined. Such a notion has only recently been given credence now that AI techniques have achieved nominal success: We are thus entering the age of the AI composer.

In the traditional sense a composer stitches musical ideas together into song. Drawing upon their prior auditory and emotional experience, talented composers seize our senses with their moving passages. Can a machine be trained to compose songs, too? And if so, why would we want it to write music, anyhow??

Machines can enrich our auditory experience. AI in particular can be leveraged to detect musical nuances that have hitherto existed beyond the bounds of human creativity. What is more, AI can be employed such that these machine-detected musical idioms are uniquely combined. This is the essence of the AI Composer: Machine learning tools are, at present, capable of detecting musical idioms and stitching them together in neat ways.

Accordingly, I envision a symbiotic relationship between AI Composers and their human counterparts. AI Composers present ideas to their human partners. People can then take certain elements and work them into their own, uniquely human, compositions. These AI-assisted compositions are then used to further train AI Composers, creating a cycle that renews itself symbiotically.

The Experiment: Making an AI Composer

Forays into the intersection spanned by AI and music are numerous, and have, in a sense, reached mainstream status. Google’s Magenta project being an example of this.

Magenta is a fun little library that uses TensorFlow as its machine learning workhorse. Examples using Magenta to toy around with music (and art) exist. However, I sought something more. I wanted to sample tracks of my choosing and layer them using a machine learning framework. I sought to use machine learning to ‘mashup’ songs. Moreover, I wanted to output sheet music that musicians could fool around with.

Music and Machine Learning with Magenta.

Magenta can be used to manipulate music in a multitude of ways. The inclined reader could perhaps be stimulated by the additional Python Notebooks supplied by the Magenta team. (All resources mentioned here are collated into a list at the conclusion of this article)

Of the Colab notebooks made available by the Magenta team, the MusicVAE notebook fit the bill as being something that I could modify to serve as an AI conductor in the capacity desired. The MusicVAE project is described in vivid technical detail via the corresponding research paper [1].

Upon opening the MusicVAE notebook, one is greeted with the following summary: “MusicVAE learns a latent space of musical scores. This Colab notebook provides functionality for you to randomly sample from the prior distribution and interpolate between existing sequences for several pre-trained MusicVAE models.”

Via Magenta’s MusicVAE notebook, it is straightforward to feed MIDI files into select pre-trained models. (So as not to lose you in a sea of acronyms: MIDI is a format that facilitates communication between electronic instruments and computers) A boon for the music production industry, and likewise an asset to our AI Composer, MIDI files can house multi-instrument scores. Furthermore, owing to today’s open-source community, sizable collections of MIDI files are freely downloadable; for example check out this Resource [2].

With musical scores (via MIDI files) and an AI tool (Magenta) to sample existing works, I was able to sample the scores and create unique, ‘mashed up’ compositions. The resulting, mashed up compositions can be downloaded (in MIDI format) for use in an music editor of your choosing (e.g. GarageBand on a Mac). Despite the joys of mixing tracks live on a PC, there’s something about a printed, sheet music score that I’m a fan of. See the music, play the music, mix the music; I find joy in this particular order.

To shore up this last detail, the conversion of MIDI files to sheet music, I found a drag-and-drop web resource that was up to the task. Mission accomplished. I was able to complete the experiment. I used machine learning to create a unique, mashed-up musical score.

Future Directions: Towards an artist friendly, AI-assisted Song Composition Tool?

With community support and collaborative insight, I’m hoping that the project described at present might spur something larger. I envision a tool that can be of utility to a large cross-section of musicians and songwriters who are seeking a creative spark.

Does the notion of composing and mixing music within extended reality environments (e.g. AR/VR environments) generate excitement?

Can streamlined models be employed to run an AI composer directly within someone’s browser or within an app?

Can a web UI be constructed such that AI composer tools can run seamlessly on a back-end server?

Would it be beneficial if we were to provide an environment in which users can use AI to compose a track and then, without leaving the environment, mix the track to their liking?

For starters, however, I would be delighted if this project were to pique the interest of a high school student, an excited young human who cannot wait to fool around with the tool at hand. The enthusiastic student might then use it to create a score of their own, print out a bunch of copies, bring them to their Jazz Band class, and subsequently surprise their unsuspecting classmates with a AI-assisted composition!

Dive Into the Details: How to execute the experiment yourself!

This project is based upon the Magenta team’s Music VAE project and the corresponding publication [1]. My work expanding upon theirs lives my this Github repo.

0a. Start with a good attitude. A knack for transforming frustration into determination is a prized asset in this industry.

0b. You can run my notebook ‘as-is’ in Google Colab. However, you might find it fun to have some songs that you actually want to hear mashed up. If you don’t have MIDI files of these particular tracks, this resource is a good place to begin your search.

Subsequent steps are written with reference to the Ipython notebook displayed below.

Open my Github hosted Ipython Notebook. (I’m partial to using Google Colab; simply click the blue button atop the rendering provided below!)

2. Locate the cell below “Setting up the Environment” and run it.

3. Load the 16 bar trio models by running the cell beneath the “Loading the models.” heading.

N.B 1: My decision to use the 16 bar trio models is arbitrary, and is based upon my intended output, a musical score. The Magenta team’s notebook includes examples based upon other models. These are fun, too! However for the sake of brevity, I shall leave these as areas to be charted by the enthused reader.

N.B. 2: It’s okay to be confused by some of these abbreviations and phrases! One can adroitly create an AI composer without being privy to their meanings.

However, in the event that there’s some legitimate intrigue surrounding the acronyms VAE and LSTM, I feel obligated to provide insight (or at least attempt to). Also, please delight yourself in this introductory machine learning course provided by MIT. Their laboratory example of using machine learning to toy with music was a key inspiration for this project.

MusicVAE is short for Music Variational Auto-Encoder (VAE). Variational auto-encoders operate under the principle that something, say for example, a musical composition, can be modeled as a distribution of latent variables; this is the act of encoding. The distribution of latent variables (latent in that they have no meaning outside of our model) that encodes a musical score can then be decoded back into a musical score; and if your model is sufficiently accurate, the decoded score might be impossible to discern from the original. This notion lies at the heart of ‘deep fakes’…if you’re into that sort of thing.

The model employed in this experiment is expanded upon by giving it ‘memory’. This is the LSTM aspect. LSTM stands for ‘Long Short-Term Memory’. When encoding a static object like an image, there is no need to account for time. Take all the pixels, map them to a set of distributions, and if you so desire, decode this mapped distribution to yield an image that is minimally different from the inputted image. With a dynamic object, such as a musical score, a single mapping will not suffice. Instead we map overlapping ‘chunks’ of our score at multiple points in time. The takeaway idea of an LSTM is that these overlapping chunks are not taken independently, but as a sequence. A great tool for working with data that contains periodic structure…such as a musical score!

4a. (Option 1) Load the MIDI files that I have placed in the Github repo. This is done by running the cell beneath “Using stock MIDI files.”

4b. (Option 2) Load files of your choosing. (This option assumes that you have MIDI files at your disposal. This is best done by running two cells, one cell for each MIDI track you wish to load)

5. Extract Trios by running cell beneath “Extract some Trios” heading. This step runs through the MIDI files and extracts trios contained within. A trio can be viewed as a score that contains three parts: drum, bass, lead (e.g. guitar, keys, and/or vocals). The trios extracted are all 16 bars in length. For a given track, multiple trios may be extracted.

N.B. For certain tracks, the ‘trio extractor’ can fail to extract any usable trios. In this event, you will have to choose a different track. I may ask the Magenta team to suggest a workaround here.

6. Mash up the Trios! Accomplish this by running the cell below the heading “Trio Mashup Time!”. You can have some fun fooling around with the ‘temperature’ setting, which can change the feel of the mashup quite a bit!

7. Download your Mashup by running the cell below “Download the MIDI file for some mixing fun!”

8. (Optional) Open your MIDI Mashup in a music editor of your choice.

9. (Optional) Convert your MIDI Mashup to sheet music form, and get ready for some sight reading fun!

A sample score composed via machine learning! Image is author’s own work.

Resources

Magenta Homepage

My Project’s Github Repository

Blog post describing the MusicVAE notebook

The Magenta teams’s MusicVAE Notebook

Solid source of downloadable MIDI files (thank you Collin Raffel)

MIT CS191 Course

Drag-and-drop MIDI to sheet music web app

Acknowledgements

Ava Soleimany and Alexander Amini, the instructors of the MIT course, whose lab inspired me to dig deeper into the intersection between AI and music.

Christine Liu, and her really cool, AI-centric Instagram account, @ai.noodle. Follow her there for AI generated recipes that are more often than not, delicious. Seeing her ai.noodle blog inspired me to share my ML explorations, too!

Colin Raffel, who kindly made an awesome collection of MIDI files available.

Citations

[1] Dinculescu, M., Engel, J., & Roberts, A. (2019). MidiMe: Personalizing a MusicVAE model with user data.

[2] Raffel, C. “Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching”. PhD Thesis, 2016.