Pianola Network

TLDR: During my experiments with discrete NN architectures I made a VAE neural network for stylized midi music generation. Later, we used it to make several public performances, such as a live performance in style of Alexander Scriabin by a theremin with orchestra on the opening of Yet Another Conference (an annual IT conference organized in Moscow by Yandex) or a performance of neuro-jazz track by Russian jazz trumpet player Vadim Eilenkrig on a closed party to celebrate 100 year anniversary of jazz.

logo by Kirill @innubis Anastasin

After several experiments with neural network language modeling, I decided to go deeper and try some more different architectures. As a model task, I took the MIDI music generation since a lot of such data is available online and it’s relatively easy to parse. I collected ~4Gb of MIDI files and spent several months on parsing and cleaning this data — autoselecting the solo track, removing degenerated tracks, detecting bass and drum tracks and so on.

Anyway, I got several thousands of clean tracks to learn from and started some research on different RNN architectures. I started with LM, then tried some seq2seq approaches and finally managed to teach a more or less stable VAE network. Here is a short video of the exploration of this VAE space:

Some details on the architecture could be found in a preprint of our article “Music generation with variational recurrent autoencoder supported by history” presented on CMMR 2017.

Among the other experiments I made were two additional seq2seq based networks — bassNN and drumNN, to provide some orchestration to the solo, generated by the main network.

* * *

As I started to share some results of my experiments, it appears a bunch of music artists interested in them, so with a help of a friend on mine, Ivan Yamshchikov, we made several performances and speeches for an art public. Among them were:

More to come :)