Convert midi file to numpy array (Piano Roll)

Published in

Analytics Vidhya

5 min readJul 12, 2020

Recently I’m exploring an interesting topic, neural music generation. Before experimenting with different neural network architectures, there is some import work to be done: data collection and data preparation.

Data Collection

I’ve subscribed for the Friend Level (20 EUROs) from kunstderfuge, and collected enough high quality classical piano midi files.

Midi File

Let’s use Chopin’s Nocturne in D-flat major, Op. 27, №2 as the example to study what midi file looks like, and what important information we can extract. Below is a screenshot of “nocturne_27_2_(c)inoue.mid” in GarageBand. We can see 5 tracks, and it seems only the second, third and fourth tract actually make sound.

To see more details, we can open it in Python via Mido (pip install mido).

A midi file contains one or multiple tracks which can be played simultaneously.

Let’s check the content of tracks of ‘nocturne_27_2_(c)inoue.mid’:

The first track (track 0) contains meta messages storing information such as file description, time signature, key signature, tempo and so on. The messages are sent in a sequence. Notice that there is a parameter “time” at the end of each message, it is used to tell the waiting time after sending the last message and before sending the current message. In this track, though setting “tempo” and “time”, we can decide how fast the midi file is to be played during each period. The default tempo is 500000.

The second track (track 1) and the third track (track 2) contains similar information: a few meta messages, and the major part - what note to be played in what way at what time. Let’s take a closer look at the third track (track 2).

“note_on” tells the key is to be pressed (or released, if velocity=0).
“note_off” tells the key is to be released (velocity should always be set to 0).
“channel” tells to which channel the sound is to be sent. The standard midi supports 16 channels simultaneously.
“note” tells which key it is. We can refer to the map below for the corresponding key on piano keyboard to each midi note id.
“velocity” tells how fast to strike the key, the faster it is, the louder the sound is.
“time” tells us the waiting time between the last and current operation. The duration of a note is the sum of “time” from each message in between of 2 nearest messages about the same note, where the first one tells you to on the note (when you see “note_on”, and “velocity” > 0) and the last one tells you to off the note (when you see “note_off”, or “note_on” with “velocity”=0).

Map between midi note and 88 key keyboard

The Fourth track (track 3) mainly contains messages start with “control_change”, it’s related to the control of pedals. Since my major concern is about the notes, I will ignore this type of messages for now.

The Fifth track (track 4) only contains 2 meta messages, which are not important for our problem.

Below is the summary of message type and parameter value range:

Parameter type and range (https://mido.readthedocs.io)

Code

With the basic understandings of midi files, we can now write some code to convert midi file to numpy array.

The desired array format:

Dimension = n rows * 88 columns, each row contains the state of 88 notes at a particular time step. The notes out of piano keyboard range will be ignored.
The values in the array represent velocity (0 means note off, while (0:127] means note on).
The array combines notes information of all tracks whose number of messages is no less than a threshold. The threshold is calculated as 10% of the number of messages of the longest track.

The function msg2dict extracts important information (note, velocity, time, on or off) from each message.

The function switch_note changes the last_state (the state of the 88 note at the previous time step) based on new value of note, velocity, note on or note off. The state of each time step contains 88 values.

The function track2seq converts each message in a track to a list of 88 values, and stores each list in the result list in order.

The function track2seq takes the threshold of minimum number of messages into consideration while filtering tracks, and combines all tracks into one numpy array. If two sound track on the same note at the same time, it takes the larger velocity.

Let’s check the result: