Week 2 — The Artificial Pianist

Published in

BBM406 Spring 2021 Projects

4 min readMay 2, 2021

Last week we have started to search for ways of generating music, especially instrumental music. The First-week blog post is here: link.

Model Selection

CNN or RNN?

In our project, we are planning to use an RNN model. CNN is considered to be more reliable and powerful than RNN but they are suitable for spatial type data such as images. On the other hand, RNN is more effective and suitable for temporal (or sequential) data, and we know that musical data is going to be sequential. RNN uses time-series information which makes use of historical data, and we need that feature while processing audio. Last but not least, RNN can handle arbitrary input/output weights which is another advantage compared to CNN.

In our project, we need to estimate upcoming notes and how long these notes keep going. Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. Our dataset has lots of piano music with different lengths and Recurrent Neural Networks model can work with different input sizes.

A weakness of RNN is that in long models, earlier layers get a small gradient update in backpropagation, and these layers don't learn. That problem is called “the vanishing gradient problem” and most of the recurrent neural networks suffer from that problem. Fortunately, there are two gates that can be a solution to the problem: “Gated Recurrent Unit”, “Long Short-Term Memory” gates. Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs. We also use Self Attention Layers for our neural network architecture. The complete learning model of our project will be determined until next week.

Preprocessing Data

We will use the Maestro Dataset (link) that only contains midi files of piano music.

For converting midi files to a data frame with columns such as ‘Start’, ‘End’, ‘Pitch’, ‘Velocity’, ‘Instrument’, we will use a python library named pretty_midi .

“Start” is the start of a note played in a second. “End” is the end of a note played in a second. There can be an overlap multi notes in a time. “Pitch” is the MIDI number of the Note played. “Velocity” is the force in which the note is played.

In the above picture, we can see MIDI numbers of notes.

fn_in = os.path.join(‘..’, ‘data’, ‘C1’, ‘FMP_C1_F12_Bach_BWV846_Sibelius-Tracks.mid’) midi_data = pretty_midi.PrettyMIDI(fn_in)

After adding midi_data objects into midi_list list we can convert them into data frame object by coding:

df = pd.DataFrame(midi_list, columns=[‘Start’, ‘End’, ‘Pitch’, ‘Velocity’, ‘Instrument’])

Our dataset includes only piano sounds so we need only 4 columns.

Using get_piano_roll function in pretty_midi we can get the notes in binary 2D NumPy.array in (notes, time) dimension array. The notes length is 128 and time is the duration of the music divided by FPS. Then we convert this NumPy array to a “python dictionary” that includes periodic data. Using this dictionary, we can train our model with the sequential content of the dictionary.

Additional Feature

In our project, we have decided to add a new feature which is estimating the remaining part of a piano song. The music will stop at a given time and the algorithm will try to predict the remaining part of the song. After the prediction operation, a new “midi” file will be created and we are going to be able to listen to the predicted part by ourselves.