Notochord: Is musical improvisation the final frontier of AI pretending to have a soul?

Gabriel Thompson
deMISTify
Published in
5 min readApr 29, 2024

Introduction to Notochord

Over the last few years, generative AI has increasingly proven able to generate media on par with that of humans. We’ve seen technologies such as DALL-E 2 and Midjourney being used in place of human graphic designers, and ChatGPT being used to write everything from scientific papers to blog posts.

In spite of all this, there’s been one type of media that generative AI has seemed unable to conquer: music. As it turns out, it’s pretty difficult to teach a computer to write music like a human! Most efforts to generate music with AI, such as OpenAI Jukebox, have involved trying to generate audio files using transformer models. However, some researchers have recently been taking a different approach: trying to generate MIDI files (files which store information about musical notes) using generative AI.

Today, we’ll be taking a look at Notochord, a probabilistic generative AI model for generating MIDI accompaniment for live music performances. This model allows a user to play a live performance using an instrument, and generate a backing track which updates in real-time according to what they are playing. The user is also able to provide specifications for which sounds the AI is allowed to generate in real-time — so, for example, the user could tell the model to only generate music using a saxophone, or only generate music with a low velocity, or only generate music at a certain pitch. You can check out a demo of Notochord in action here!

How Notochord works

Fig. 1: Architecture of Notochord Training

The core technology behind the note generation in Notochord is a recurrent neural network (RNN) which stores, as its state, all previous notes played in the MIDI track. Notochord keeps track of four aspects of a note:

  • Its pitch (an integer value from 0 to 127, according to the range of acceptable MIDI pitch values)
  • Its instrument (an integer from 0 to 127, according to the range of acceptable MIDI instrument values)
  • Its length (a continuous value measured in milliseconds)
  • Its velocity (a continuous value from 0 to 99)

(note: “Note off” MIDI signals, which occur when a note is finished playing, are denoted by a velocity of 0)

For each of these four aspects (referred to as “sub-events”), the neural network can be queried to calculate the probability distribution of that sub-event taking on each value. For example, you could query the RNN to calculate the probability of the instrument taking on each of the 128 MIDI pitch values and get the following distribution, with which you could use to conclude to play a D4:

Fig. 2: Example of a probability distribution of pitches

Each of the sub-events (pitch, length, instrument, and velocity) are dependent on each other, so the probability distributions shouldn’t be computed at the same time. Otherwise, the model might decide on a low note, and then for unrelated reasons, decide that using a flute as the instrument is appropriate. As such, the model instead decides on an order of sub-events (i.e. instrument, then pitch, then velocity, then length), and then decides on the value of each sub-event in that order. In this case, it would choose the most likely instrument, then choose the most likely pitch given that instrument, then choose the most likely velocity given those two attributes, and then decide the most likely note length given those three attributes. The order of sub-events is decided at training-time, and is randomized.

The model uses the Lakh MIDI dataset as its training data. This dataset contains over 100,000 songs in the text-like MIDI format, with the data containing the pitch, velocity, instrument, and length of each note.

Implementation

The paper mainly only proposes the structure of the neural network, leaving different options for how to implement Notochord into practice. For the sake of demonstration, the authors of the paper implemented two things: (1) a Python API for the neural network and (2) a script to constantly communicate with the synthesizer and communicate with the Python API based on MIDI events from the synthesizer. This communication is done via the “Open Sound Control” protocol.

When you’re actually playing the instrument, the program will constantly query the neural network for the next note to play and wait the appropriate amount of time and play that note unless the user triggers a MIDI event in the intervening time. If the user does so, the state of the RNN will be updated to account for the new note played by the user and the program will again predict the next note to play. This process repeats for as long as the program is running, and is reliably able to generate predictions and trigger MIDI events within 10 milliseconds.

Results of Notochord

Fig. 3: Notochord in action

Most of the music generated by Notochord is pretty incoherent. As it turns out, it’s really difficult for an AI to improvise along with a main melody without having any explicit information about the music’s chord progression or tempo. That being said, most of the output generated by Notochord still feels somewhat consonant with the rest of the melody, and generally gets better the longer you’ve been playing for (as the RNN has more information to go off of). You could imagine that Notochord might work best for soloing over long drone-y jams with lots of repeated notes. But if that’s the case, then Notochord’s live, near-instant reaction to user changes becomes less useful.

Another interesting application of Notochord is using it to generate music without providing it with any note information (i.e. not entering any notes into the keyboard, and just letting the RNN decide what to play next based on training data). It generates a generic pop-sounding diatonic melody over a vi-V-IV chord progression — it’s as if the RNN found the arithmetic mean of every pop song written in the last 20 years.

Conclusion

While Notochord is very technically impressive, it seems unlikely it’ll be used for improvisation in live concerts any time soon. Although it’s intriguing to see how technology like Notochord could theoretically be used to complement the human performer, improvisation strikes me as a specifically human activity, and so long as AI falls into an uncanny valley at generating music, I doubt it will take musicians’ jobs any time soon.

--

--