Can Algorithms Hear Musical Structures? Introducing the “L-Measure”

CDS’s Brian McFee, Juan Pablo Bello, & co invent a new methodology for Music Informatics Research & Music Cognition

The one good thing about music,” Bob Marley once said, is that “when it hits you, you feel no pain.” Instead, you’re embraced by a song’s warm tone or jazzy timbre, its thudding beat or classical melody.

Annotating and categorizing all of the different musical elements in a song is precisely what the field of Music Informatics Research (MIR) does, for a variety of reasons.

Annotated music can help us discover auditory patterns in songs of the same genre or across different genres. And, it can also be used to train machines that produce music algorithmically, or help DJs craft appealing mash-ups and remixes.

But a major problem in the field is that there is simply too much disagreement between existing annotators.

As CDS data science fellow Brian McFee, CDS affiliated faculty and Associate Professor at NYU Steinhardt and NYU Tandon Juan Pablo Bello, Morwaread Farbood from NYU’s Music and Audio Research Lab (MARL), and Oriol Nieto (Pandora) explain in their peer-reviewed paper, part of the issue is that many annotators do not account for the fact that music is hierarchically constructed.

“Most existing approaches…characterize [musical] structure simply as a sequence of non-overlapping segments,” they explain in “Evaluating Hierarchical Structure in Music Annotations.”

Moreover, many evaluate the musical data according to a single “ground truth” rubric, thereby relying on the “unrealistic assumption that there is a single valid interpretation to the structure of a given recording or piece.”

“Even when annotators do account for hierarchy,” McFee adds, “the methods we have to analyze their annotations cannot handle hierarchies.”

Enter the L-measure, a novel methodology that McFee, Bello, and their team invented to measure the level of agreement between musical annotators.

Stepping away from the “ground-truth” model, the L-measure embraces the possibility of multiple valid interpretations because it measures the extent to which musical annotators agree and disagree with each other’s evaluations about the data.

Moreover, the L-measure compares hierarchical annotations “holistically across multiple levels.”

Their promising methodology is poised to become a powerful tool for those in field — and for those working in music cognition — for it can help researchers assess the response similarities between different listeners of the same song.

By Cherrie Kwok


Their work was supported by the CDS’s Moore-Sloan Data Science Environment, the NYU Global Seed Grant, and the NYUAD Research Enhancement Fund. Learn more about the research here.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.