How might ‘Digital Audio Workstations’ integrate AI/ML

ptr-1. generating melodies that follow chord progressions

Steve Hiehn
5 min readJan 14, 2018

As modern users of technology, most people are already familiar with basic AI powered tools such as autocomplete and language translation apps even if they don’t know exactly how they work. Machine learning (ML) is finding its way into all types of software ranging from autonomous vehicles to automated investment advice and many other applications where the casual consumer may not even realize. DAWs: digital audio workstations are a category of software I use all the time but have yet to see any serious mainstream AI integration. I have extensively used DAWs such as Logic Audio, Pro Tools, Reaper, for years and often find myself wondering what an AI powered DAW might look like, and how I could see myself using it.

In this post I’m going to demonstrate with code one ML feature in particular I think would be valuable to composers: an AI assisted melody writing tool. When I am writing a song I usually have a sequence of chords I’ve selected to work with after exploring options on my guitar or keyboard. If the DAW user interface enabled me to enter that chord sequence, or (even better) deduced that information for itself it would have enough information to start trying to be helpful. Even given just the chord sequence it should be able to offer melodic options that fit in at any point in the song, either though a drop down menu or auto suggest UI. Furthermore, over time it seems possible to have the app adapt to an individual’s personal taste and short list the options presented.

Imagining features is all well and good, but the devil is in the details. What tech could be used and what would the code look like that could actually do this? Well, the short answer is to find a way to leverage Recurrent Neural Nets (RNN). If you have any doubt that RNN are capable of generate pleasing musical phrases, please have a listen to Daniel Johnson’s demonstration: Composing Music With Recurrent Neural Networks. Hopefully, we can agree that these generated examples are quite impressive. But, still, a logical argument might be, “Ok, so it generates interesting music sequences but they are random and unstructured phrases. How would that be useful in aiding a musician when they are composing an original piece?” I think the answer is they would not be in their current form. However, I feel confident that RNN’s can be very useful for making predictions simply by re-structuring the data in order fit our use case.

To understand how this problem might be solved there needs to be an understanding of how many musicians compose harmonic music in the first place. When composing a melody the two most useful pieces of information are the chord progression and the key(s) that the piece of music conforms to. You can think of a key as a subset of all the possible notes used in a section of music. This is analogous to a color pallet used in a painting. The chords dictate the compatible notes during the time they are occur in a song. For example: if the first four beats of a song has the chord “C-major” the safest possible notes to play during that time are the notes that make up the chord itself. In the case of C major it would be C-E-G. The composer’s choice of notes is certainly not limited to those notes but they are guaranteed to sound harmonically ‘correct’. The next safest notes are ones that conform to the overarching key but there is no guarantee they will harmonize. If notes from outside the dictated chord and key are played there is a very high risk of the notes clashing horribly. So, in order to make a prediction we need to train off a data set that contains key, pitch, rhythm, and harmonic structure (chords).

But where do you realistically find a huge amount of data so perfectly annotated? Well, lucky for us it turns out that this isn’t a new concept at all. Jazz musicians have long known that the two most critical elements of a song are the chords & melody. The rest of the music can be ‘faked’ or ‘improvised’ on the fly. Hence, we have the existence of fake books! Collections of pop song lead sheets containing harmonic structure (chords) and the dominant theme (the melody).

In order to try to prove that this feature is realistic I have been leveraging the optical character recognition feature from the SmartScore software to digitize the lead sheets from the fake books. If you would like to use the dataset for any reason you can find it the dataset here on github. The dataset is in MusicXml. I’ve provided a link to a Golang app I made to encode the dataset into a usable format. If you would like the data already formatted you can try this S3 bucket but it may not be the latest version of the dataset.

The approach I took is to try and make predictions similar to the way text autocomplete works. For example, when you type into a google search “I want to be ..” and the algorithm fills in something like “famous with lots of money” or “a doctor”. In order for this to apply to our problem we need to offer a sequence of data that contains the predictors first. In our case its the key and chord structure first. This puts us in a position where we can ask the network to make a controlled prediction which is a melody compatible with the key & chords).

If you look at the source code or README.md for the golang app you will see that it parses the MusicXml files and before each 4 bars it encodes the key + chords & the then the melody like so:

SUDO CODE:

(KEY)-(BAR1 CHORD)-(BAR1 CHORD)-(BAR2 CHORD)-(BAR2 CHORD)-(BAR3 CHORD)-(BAR3 CHORD)-(BAR4 CHORD)-(BAR4 CHORD)-(THIS DATA IS AN ENCODING OF THE MELODY FOR THOSE FOR BARS)

ACTUAL FORMAT:

“31³¹³*313*313*313*613*613*715*715*”

To run the golang app your self:

(clone the code)

git clone https://github.com/shiehn/MusicXmlGoParser

(from the root of the project)

go run main.go -dir=PATH_TO_THE_CHORD_MELODY_DATASET -encode=true

In order to actually use the encoded data to make predictions I have prototyped a Java command line app that accepts an key&chords encoding like: “31³¹³*313*313*313*613*613*715*715*” and will generate multiple midi files that contain a prediction for the following bars. Assuming the model was trained well and with enough examples we should get some decent options that the composer can choose from.

To run this example or see how I’ve build it please visit the repo Chord-to-Melody repo The readme will contain detailed instructions but the workflow will be:

(clone the code)

git clone https://github.com/shiehn/chord-melody-dataset.git

(enter project folder)

mvn package

(from the root of the project)

java -jar target/chords-to-melody-generator-1.5.1.RELEASE.jar -chords=31³¹³*313*313*313*613*613*715*715*

Here are some examples generated from my model:

https://s3-us-west-2.amazonaws.com/sastrainingdata/midi-a.mid

https://s3-us-west-2.amazonaws.com/sastrainingdata/midi-b.mid

https://s3-us-west-2.amazonaws.com/sastrainingdata/midi-c.mid

https://s3-us-west-2.amazonaws.com/sastrainingdata/midi-d.mid

https://s3-us-west-2.amazonaws.com/sastrainingdata/midi-e.mid

https://s3-us-west-2.amazonaws.com/sastrainingdata/midi-f.mid

https://s3-us-west-2.amazonaws.com/sastrainingdata/midi-g.mid

Learn more about my algorithmic music projects here:

http://signalsandsorcery.com/

--

--