GestureRNN: Learning musical gestures on an interactive XY pad

2 min readDec 31, 2017

This project uses machine learning and deep learning to create a new kind of musical instrument based on the new Roli Lightpad instrument. Firstly, I use Wekinator’s machine learning capabilities to continuously interpolate between various sonic parameters in a custom-designed tension synthesizer in Ableton Live. More importantly, I train a three-layer LSTM that learns to generate gestures and swipes across the surface of the Lightpad based on user input called GestureRNN. GestureRNN regresses continuous values of (x,y) coordinates and instantaneous pressure (p) in real-time based on a user’s seed gesture.

You can checkout the full GitHub repository on this link!

This project explores the notion of using low dimensional outputs to create art. Machine learning models used for generative art tend to model and render the final high dimensional output directly. For example, LSTM’s trained on MIDI output the final piano roll directly (88x16 ~ 1400 dimensions), GANs generate the final image directly (128x128x3 ~ 50000 dimensions) while WaveNet generates the final waveform directly (22000x256 ~ 5,000,000 dimensions per second).

Arguably, humans do not operate in this final high dimensional output. An artist does not think about the final RGB values of each pixel, but instead thinks in terms of brushstrokes and movements across the canvas. A musician does not think of the final score, but instead chooses notes based on embodied and cognitive process that synthesizes the current music, timbre of surrounding instruments and physical constraints of both the player and instrument.

The work was heavily inspired by David Ha’s SketchRNN, a deep learning model trained on human sketches, and conversations about low dimensions of artistic output over Summer 2017 with Doug Eck, both from the Google Magenta team. SketchRNN is able to learn pen strokes from over 50,000 sketches and “draw” in real time with the user. GestureRNN for Lightpad thus does not generate the final waveform directly, but listens and learns the musical gestures from an experienced player that give rise to expressive motifs on the instrument.

There are quite a few moving parts in this project — I have attempted to document all of them and recommend best practices on how to connect all the different components together through my GitHub, even if you don’t have all the gear I used for this project.

Written by Lamtharn “Hanoi” Hantrakul