Piano Player

Published in

DBRS Innovation Labs

11 min readFeb 11, 2016

In his novel Galatea 2.2, Richard Powers’s eponymous narrator attempts to “teach” a neural network enough about English literature that the machine would be able to perform a plausibly-human analysis of a literary text. When Powers published Galatea in 1995 the proposal that an artificial network could be sophisticated enough to model something as subtle as human art was a purely science-fictional conceit, an imaginative projection based on the technologies that existed at the time.

21 years later the premise no longer feels so far-fetched. In the first month of 2016, the DBRS Innovation Lab partnered with research resident Aaron Arntz on a project in much the same vein as Powers’s fictional endeavor. Our question: Can we compel a neural network to compose plausible music?

DBRS Innovation Lab Research Resident Aaron Arntz (Photography: John Farrell)

This project is the first undertaken as part of the Innovation Lab’s Researcher in Residence Program, in which residents from diverse backgrounds collaborate with the lab’s data scientists and machine learning specialists on month-long investigations. The program is a way for the Innovation Lab to gain exposure to ideas and methods that are far afield from the practices and protocols that are conventional within FinTech. This exposure is vitally important for our company to have, given the rapid acceleration of new technologies. In today’s competitive, data-driven environment, it is crucial for companies to be conversant with the latest innovations in tech. The Research Residency takes this premise one step further, and tries to gain inspiration from the diverse backgrounds and experiences of residents with the goal of enabling DBRS to start developing market-altering innovations ourselves.

Arntz trained at USC’s Thornton School of Music and is an accomplished professional musician, having played with musical acts as varied as Zappa Plays Zappa (with whom he won a Grammy in 2009 for Best Rock Instrumental Performance), Edward Sharpe and the Magnetic Zeros, Grizzly Bear, and most recently, Beirut. He is also a computer artist, whose combines a deep interest in computational systems with a playful exploration of alternative compositional techniques. For instance his project Sonivim turned the Vim coding environment into prepared instrument; his language tuRING allows users to collaborate in an improvised musical composition along with a computer.

For the project Arntz teamed up with DBRS Innovation Lab’s Jen Rubinovitz and Jamis Johnson, both of whom studied machine learning at Columbia and had experience with neural nets. The trio shared an interest in time series pattern matching and pattern discovery — using state of the art machine learning algorithms to detect patterns in time-based data sets. This capacity is obviously critical for analyzing music, but the experiment would require the team to implement techniques that are equally applicable to financial datasets.

“A lot of people will say that music has nothing to do with our business,” said Johnson, “but this is an algorithm for finding patterns in high level time series data.” Both sheet music and financial data represent trends and patterns that are extended in time, but unlike financial data, humans are intuitively able to identify and judge the patterns of a musical composition. This makes music an ideal use-case to test the machine’s ability to understand patterns that emerge over time. We may not understand the complex interrelations that comprise financial markets, and so it is difficult to say whether a machine’s “understanding” is accurate or not. By testing their algorithms on music, the team was able to hear immediately how much their machines had understood.

Using musical data allowed the team to jump right in, Rubinovitz added. This way they could study its applications in an abstract way, rather than searching for the perfect financial data.

“There is a chicken-and-egg situation between proven applications [of a technology] and experimental implementation,” Johnson explained. “Most people believe you need to have a proven application before you implement, but as you are experimenting you learn so many new applications that you never thought of before.”

After reading Andrej Karapthy’s “The Unreasonable Effectiveness of Recurrent Neural Networks,” (a “must-read” according to Arntz, for anyone interested in recurrent neural networks) they decided to train a Character-Level Recurrent Neural Network (char-RNN) on a corpus of canonical keyboard music.

“Initially we started with Chopin, then looked at all of [the available music], then we decided were just going to do piano,” said Rubinovitz. The team looked to an archive from a site called muTopia, which had a large repository LilyPond files — a relatively arcane file-type that compiles down to sheet-music. “[It can] describe complex musical structures with relatively few characters,” Arntz said. “The reason I chose LilyPond,” he continued “is that it’s very expressive to write. This is an expressive language to work with, in a way that it is intuitive for composers.”

LilyPond files contain all of the information that would ordinarily be represented in sheet music, including articulation and dynamics. It differs in this respect from more common musical file-types such as MIDI, which sacrifices musical nuance in favor of simplicity.

“What if MIDI isn’t [actually] music? You want sheet music? Train it on sheet music!”

“What if MIDI isn’t [actually] music?” Arntz asked, quite forcefully, when I spoke to him. “What if music is actually a pianist performing the music? You want sheet music? Train it on sheet music!”

Arntz’s credentials on this topic are so unimpeachable, and commitment to this position so absolute that it took days for me to realize just how baffling I found this line of reasoning, and weeks to understand how important it is to the project. Because if a “music is actually a pianist performing” then why not train the net on a corpus of performers? Another way of asking the same question: if MIDI and sheet music are both standardized representations and thus by definition not music themselves, then what makes one proxy preferable over another?

It is true that sheet music is more subtle and expressive than MIDI files are, but this I do not believe that this is really what was at stake for Arntz and the team when they decided to limit their training corpus to sheet music. As a protocol, MIDI has been a staple of computer music for over thirty years. But the tantalizing promise of research into neural networks is that they will be able to parse information the way humans do. So because the end-users in this case were to be human musicians, it was only natural to train the neural network on sheet music instead of MIDI.

When training a neural network, the choice of data type matters because it determines the kinds of patterns it will identify. This is a lesson that will be deeply valuable when applying research on neural networks to financial data — the machine’s output will be written in the same language as the corpus you feed it. That is why it is important to choose a language that is well-suited to holding the kinds of conversations you anticipate having with the neural network.

Johnson told me that the team had briefly considered “writing a neural network from scratch” but settled on using Karpathy’s Char-RNN library. “It was a good fit,” he said. “Recurrent neural networks are strongly applicable to any time series based data with underlying patterns.”

From left: Johnson, Rubinovitz, Arntz (Photography: John Farrell)

All neural networks progress data through layers, developing complex, high-level abstractions out of low-level features, but recurrent neural nets are especially well-suited for temporally sensitive data because their processes loop back on themselves over time. “Recurrent neural networks are different from standard feed-forward neural networks in that the hidden layers — the layers that encode high-level abstractions of the input data — form a cycle,” Johnson explained.

Rubinovitz and Johnson helped Arntz spin up a computer in a remote datacenter with a specialized processor known as a GPU. GPUs — or graphical processing units — were initially developed to help process the complex linear algebra equations required for displaying a graphical environment to a user on screen. Think: gaming. Neural networks run similar matrix multiplication operations to those required for graphical computation making GPUs particularly useful for training these giant, complex networks. Training a neural network requires tuning 10s of millions of parameters until you get a satisfactorily low amount of training error. Because of the size of the mathematical problem, it was necessary to run the training process continuously, through the night, for approximately 16 hours before it could output quality scores.

Because LilyPond compiles down to sheet music, the net was effectively training on images of musical notation, but according to Rubinovitz the team “treated it like a language processing problem,” prodding the machine to review the sheet music over and over again, comparing characters in sequence until it “understood” certain features of musical composition. If it is true, as Lacan claimed, that “the unconscious is structured like a language,” then this method of using a computer to tease out the linguistic structure underlying a set of compositional systems (music) usually understood to be unconscious.

“Music, in many ways, is its own language. Data, too, is a kind of language in that it is a collection of discrete but related elements. Most data sets we have today can be considered polyphonic.”

Determining which features the neural network understood and which it blissfully ignored became the source of much amusement and consternation for the team. At a certain point it became clear that the computer could not “remember” far enough back in time to produce polyphony. The team encountered what Johnson called “a lack of communication between the staves” — the net was able to approximate one “hand” at a time, but had no logic for relating one hand to another.

One can hardly hold this against the recurrent neural network — after all, if you were three days old and had no hands, how would you know that two staves of piano music are meant to be played simultaneously? Still, the team still spent much of their time strategizing how to convey this information in a way that would convince the machine to generate polyphony. Rubinovitz elaborated as to why this problem of polyphony is so critical: “Music, in many ways, is its own language. Data, too, is a kind of language in that it is a collection of discrete but related elements. Most data sets we have today can be considered polyphonic.”

As a trained pianist himself, Arntz was able to judge the neural net’s output quite easily. Limiting the corpus to compositions for a single instrument helped contain the scope of the enterprise, and generate compositions that a living breathing human pianist would be able to play.

This latter consideration — that the computer’s output should be human-playable — was was an essential component of Arntz’s vision for the project, although not for the reasons that one might expect. For Arntz, music is only partially aural. “When we hear music we hear the final audio that was produced by someone playing,” he told me. “But there’s the entire art form of composition that was never played.”

It was this abstract art of composition that interested Arntz. The enigmatic logic that composers employ to determine pitch and rhythm, as well as more subtle shades of musical expression like articulation. As a next step in this project, Arntz plans to record himself playing the sheet music that the net has generated, re-interpreting the machine’s reinterpretation of canonical keyboard music.

“This is exactly why it’s important to have people who are experts in the field of music performance and not just theory — they are going after different problems,” said DBRS Innovation Lab Director Amelia Winger-Bearskin. “Aaron’s goals are sharing this with an audience. If you can create something that can’t be shared, he isn’t interested.”

“That is what user-centered software is all about,” she continued. “What is the point of creating these neural networks if they are not going to provide value to the end user? This is why user focused neural network explorations are very similar and similarly valuable.”

That goes for everything produced by the DBRS Innovation lab. Training a neural net to generate music may seem to be an eccentric use of computational resources, but the challenges it presents conform exactly to the difficulties that plague anybody with an interest in using machine learning to model complex human systems.

“… We are analyzing patterns and changes across time. And for this reason I believe RNNs could be quite useful in analyzing complex, time-based financial data...”

As Johnson noted, the team used this as an opportunity to explore complex aspects of time-based information. “There is a temporal component with recurrent neural networks, a sort of finite memory backward in time,” said Johnson. “This is particularly useful when applied to music because we are analyzing patterns and changes across time. And for this reason I believe RNNs could be quite useful in analyzing complex, time-based financial data… A lot of this is so new that it hasn’t yet been done in the financial community. Exploring these algorithms allows us to become familiar with the hardware and codebases that exist out there, and implementations to get up and running using them.”

The music that the Innovation Lab’s neural network generates is as strange as it is varied. The first piece (Opus1No1) features dense clusters of notes arranging themselves into dark, angular harmonies in a way that feels moody if not aggressive. Compare this against the sparkling baroque flourishes of Opus1No3 and it will become clear that the neural network “knows” a lot about how piano compositions are constructed, even if it is working with an unconventional understanding of musicality

Arntz does plan to record himself playing the machine’s output. For him reinterpreting the algorithms interpretation of keyboard music, is a way of further exploring the dynamic tension between composition and performance. For now however, the network’s output serves as a sonic representation a cutting edge algorithm. It allows people without technical backgrounds to understand and engage with cutting-edge technologies. Just as important, it provides the technically-minded with a fresh approach and a different way of thinking about their tools.

“Whenever new technology is introduced to a market, people find a broad way of speaking about it to explain its uses,” Winger-Bearskin said. “When Google introduced Deep Dream, they used imagery as a way of beginning the conversation. At the Innovation Lab we use music, words, virtual reality — all are ways of communicating to a larger audience what these algorithms are and how they could be beneficial in a variety of applications.”

Arntz will be performing these compositions live under the name “Recurrent Neural Network” on Wednesday March 30 at Le Poisson Rouge in Manhattan. Purchase tickets and read more information here.

Our team consists of engineers and mathematicians, story-tellers and data artists. We interrogate big datasets to uncover hidden trends, make animations that set beautiful geometries in motion, and train machine-learning algorithms to hew insights from raw numbers. Our tools allow us to examine the details of our economy and our world with extreme precision, and to simplify complex information accurately. We are dedicated to finding exciting new ways of helping people see the insights beyond the rating. Learn more at http://dbrslabs.com/

Written by Eamon Abraham