Music AI: Loop-in-the-Human

Jay Hardesty
Feb 21, 2018 · 14 min read

Guy walks into a computer lab and asks “Do you know your server is down?” Data scientist replies “No, but if you hum a few million bars I’ll try to fake it.”

By now, various companies offer automated music composition, For instance, Jukedeck does so in part by using machine learning to generate original pieces based on large sets of musical examples. Google’s Magenta project emphasizes data-driven “learning by example” over rule-based music creation approaches. These and numerous other projects have demonstrated impressive, human-sounding, musical results.

Rather than create entire compositions, my own project focuses on individual musical parts (say lead/bass lines within an existing composition). I want to create human-sounding music at a local scale, based on small numbers of examples. That’s because, for me, the urge to engage familiar music usually is stronger than the desire to hear something completely new. This means injecting variation into music that’s already in your head.

To this end, I want to hook into a real-time, subjective aspect of music, but at a generic, low level. In short, I want to put real-time handles on looping rhythmic/melodic patterns within existing music.

Coherence as prediction

So what is a piece of music like? For one thing, if the music has a beat, you‘re caught in cycles of anticipation versus outcome, as the the beat keeps coming around at different levels. Making sense in this context means forming subconscious predictions in response to the outcomes of other predictions.

These nested expectations evoke hierarchical structure, giving meaning to the surface notes and vice-versa. (The psychology of musical prediction was first explored in Leonard Meyer’s 1956 book Emotion and Meaning in Music.)

Harnessing innate musical sense

To some degree everyone is already a musical performer; the sense of surface-versus-structure that is familiar to musicians also exists within listeners more so than they probably realize. The difference is that musicians learn to manipulate that structure in order to create new surfaces (just as most people do with speech). That is what the building blocks aim to partially streamline.

Deepening repetition

Coord music-morphing app, morphing between chosen rhythms/melodies in real time.

But while almost everyone understands music, only practiced musicians typically manipulate it at the note level. Here I’ll describe an attempt to narrow the gap between making sense of music and improvising music, doing so via algorithms that play a coprocessor role, augmenting human musicality rather than replacing it.

Those algorithms depend on building blocks of rhythmic coherence. The building blocks are not hand-crafted constructs; they are rhythmic patterns that result from generative operations. Number theory provides a crystallization of those operations, making immediate comparison and manipulation of actual rhythms (and by extension, melodies) intuitive and computationally efficient.

(Pitches are handled as rhythmic strata within this approach, though I’ll leave that out here for the sake of brevity, such as it is.)

Coprocessor, not automaton

This algorithmic intervention takes place in strictly local fashion, leaving the rest of the composition and production in human hands, or under the control of other processes. In the latter case, the building blocks might offer meaningfully factored note data to other algorithms that are in play.

Music with moving parts

Coord linked to Ableton Live via OSC and Max-for-Live

As a listener this means I can listen to a track I like for an extended time without tedium setting in; I can pivot into melodies and beats that are recognizably related in non-obvious ways. As a (decidedly amateur) composer I can explore ideas on the fly (perhaps overcome writer’s block) and create pieces that could evolve later, even morphing with other pieces.

Swift-based macOS app that implements the building blocks and algorithms

Generative analysis and composition

Most technical details will be glazed over to some degree. Full details can be found in the paper “A self-similar map of rhythmic components” (which currently has free access if accessed from this page). Links to related conference papers, with detailed algorithms, are here.

Coming to terms

  • Meter refers to nested pulses formed by recursive subdivision of the time line. That is, there are two half notes per whole note, two quarter notes per half note, and so on, with the pulses at each metrical level alternating between weak and strong beats. (Only binary subdivision is in scope here, not expressive timing, triplets, or triple meters, etc.)
  • An anticipation is a note that occurs on a weak beat at some metrical level. It “anticipates” the stronger beat that will immediately follow at that level.
  • Syncopation occurs whenever anticipations are not followed by the anticipated notes (on the subsequent beats).
  • Loops here mean repeated patterns that are one, two, or four bars of 4/4 long. (Loops are ubiquitous in EDM but are also fairly common in many other genres.)
  • Variation as (already) used in this discussion refers not only to formal theme and variations; but more broadly to any melody or beat that can recognizably take the place of another (even in a sort of musical opposition).
Looping anticipations.
Above anticipations plotted as beat strength versus time step.

On the math side, binomial coefficients, Pascal’s triangle, and the Sierpinski gasket will appear in connection to the formation of rhythmic patterns.

Co-processing versus pre-computation

Rhythm is also built on log/linear and ratio/proximity relationships, given the underlying meter’s recursive structure. But these can’t be made ready-to-hand by a static layout because the time axis itself is in play. Computers seem an obvious tool for getting handles on musical time, assuming that the points in time can be grouped in as meaningful a fashion as pitches are on the piano. That is what the building blocks under discussion aim to do.

Algorithmic music analysis and generation

Anticipation and repetition

That anticipated note may or may not occur. But in any case, if the pattern is repeated, you’ll expect whatever happened to keep happening, even if that (somewhat paradoxically) means “expecting” surprise. (Fred Lerdahl and Ray Jackendoff’s noted the link between parallelism and coherence as part of their in their 1983 book A Generative Theory of Tonal Music.)

Nested levels of potential anticipation, with the downbeat in the final position.

And so, two psychological constraints will be the organizing principles for all that follows.

  1. A note that falls on a weak beat raises the expectation of a following note on the strong beat.
  2. The outcome of the expectation defined in (1) raises expectation that such outcomes will recur.

In other words, anticipation and repetition generate predictions about, and because of, each other. This is where rhythmic coherence is bootstrapped.

Syncopation and elaboration

Take as the first building block a looping rhythm that consists of a single note attack on the first beat. Each of the other building blocks is derived by recursively applying exactly one of the following two operations at each rhythmic levels:

  1. Syncopate by shifting all attacks one beat earlier in time.
  2. Elaborate by combining the above syncopation with the original attacks.
  3. Do nothing.

The result is a tree of building blocks that together account for every evolutionary outcome of syncopation or elaboration operations.

Left: looping syncopation at the quarter note level. Center: looping elaboration at the same level. Right: Elaboration at the quarter note level combined with syncopation at the 8th note level.
All possible building blocks for two metrical levels. (That is, all combinations of elaboration/syncopation/neither at the quarter note and 8th note levels.)

Elaboration mapped to Pascal’s triangle

The rhythm evolves like this:

Elaboration at half note, then 8th note levels has the same result as vice-versa.

Now consider Pascal’s triangle, an arrangement of binomial coefficients, in particular the odd coefficients (in bold), The rhythm encoded by the binary representation of 5 is found on the fifth row (counting from zero).

Pascal’s triangle with odd entries in bold, tilted to line up with elaboration-based building blocks.

Coincidence? No, it turns out that any encoding of elaborations into a binary number indicates the row on Pascal’s triangle where the odd entries correspond exactly to the resulting rhythm. (This is a because, according to Lucas’s theorem, the binary digits of successive odd binomial coefficients form patterns that correspond exactly to combinations of elaborations at distinct metrical levels.)

Perhaps-syncopated elaboration mapped to the Sierpinski gasket

Since each combination of elaborations and syncopations is encoded by a pair of binary numbers that share no 1s in the same binary place (because at most one operation can occur at each metrical level), each such pair of binary numbers can be combined into a single ternary number, distinguishing the 1s in the syncopation encoding by converting them to 2s.

The generator is the encoded elaborations, the offset is the encoded syncopations.

Using those ternary numbers as addresses into the fractal known as the Sierpinski gasket (as shown below), we now have map of all potential building blocks, laid out visually in terms of elaborations and syncopations. (Mathematically, this corresponds to patterns formed by binary carries in sums of binomial coefficients as established by Kummer’s theorem, which is related to Lucus’s theorem.)

Sierpinski gasket addresses mapped to building block rhythms.

Several characteristics and comparisons can be made using these integer encodings, without examining, let alone generating, the rhythm itself. The ternary digits tell you how closely two building blocks are in terms of how they evolved (that is, how many times they shared the same operation at the same metrical level)

Zooming into deeper metrical levels. Each location on each Sierpinski gaskets to the left corresponds to a (reversed) building block rhythm on the grids to the right.

Building block applications

In practical terms, the first step is to parse the rhythm of a given melody or beat into these building blocks. This is computationally inexpensive because the integer mapping spares the need to perform the actual derivations; it’s just a matter of scanning out the known patterns.

The current set of apps is called Coord. Some details and demos are at

Varying rhythms

Geometry as musical instrument

Rhythms being manipulated hierarchically, rather than note-by-note.

An key aspect of this approach becomes visible in such interactions, the fact that each attack has its own potential to become a rest, and vice-versa. The attack potential equals the combined hierarchical weight of the building blocks at the given time point (again details are in the papers linked above). The notes act somewhat like rocks in a stream, simultaneously affecting the flow and submerged, or not, by that flow.

Potential note attacks above and below the expectancy threshold for actually being heard.

Syncopation via genetic algorithm

Morphing between rhythms

Selected, weighted, input melodies being morphed into a new melody.

The building blocks afford a powerful capability: morphing intuitively between rhythms. In short, the attack potentials from two or three weighted rhythms are combined, producing a rhythm that interpolates in nonlinear fashion between those inputs.

Attack potentials are calculated for three input rhythms
The attack potentials are summed to determine the new rhythm

Landscape of rhythmic variations

Navigating the melodic morphing landscape.

Future directions

Self-organized map

A self-organized map (unsupervised neural net) for determining rhythmic similarity.

Neural nets

Perhaps I could present attacks to a recursive neural net (likely an LSTM), each encoded by a vector that indicates which building blocks include that attack (a binary vector with length 3 for n metrical levels). The aim is have the net learn the attacks patterns against the backdrop of the collective expectations associated with each pattern.

Alternatively I could simply encode each attack by its binary representation. This would be a more compact encoding (with vector length 2instead of 3) but it would relay on the net to learn the superposed patterns already factored into the building blocks.

(More realistically, the hope is to work on something like the above with more knowledgeable collaborators.)

Is source code available?

Open source could be a real possibility if the project reaches a stable point, but that will first require something like a small team effort. Meanwhile the relevant math and algorithms are available via the papers linked above, and I’m happy to engage with anyone coding those.


Ecosystems of rhythms and melodies

I welcome feedback, suggestions, questions, and, in particular, contact from those interested in comparing notes.

Jay Hardesty

Written by

lives in Zurich; his generative music project is