Music Collaboration Will Never Happen Online in Real Time

Published in

THOSE PEOPLE

5 min readDec 11, 2013

With the world moving towards high speed networks, a New York-Tokyo jam session should be easy right? It will never happen. Light-speed is not fast enough for making music in cyberspace.

in 2013, I started a school for traditional music in the Dominican Republic. We’ve been teaching kids to play bachata — which is the Dominican version of rock ‘n’ roll. Things are going well. We threw out the mainstream approach to music education, and focused on two areas: learning by ear, and playing in groups from day one.

The group learning component has been critical. Just as with rock ‘n’ roll, bachata is polyphonic — meaning several instruments play together at the same time. Bachata is nearly impossible for a single person to play alone. The guitar on its own, the bongos on their own . . . are empty. The sounds these instruments make complement each other. Put simply, it’s fun to play in a group; it’s boring to play alone. Fortunately our school is full of bachata-crazy cadets, and no one lacks someone to play with.

Cesar, Adriel, Isaa, Juliana, and Emily, students of The iASO Bachata Academy @ DREAM, Cabarete Campus, Dominican Republic

Video: A class at the Bachata Academy @ DREAM

But what if a person living in Chicago or Tokyo wants to learn to play bachata? While he or she might find an instructional video on YouTube, it could be difficult to find someone to play with.

Light speed is fast, but not fast enough. Playing music in a group is a two-way communication. For it to be possible, there must be very little delay, or latency, between the sounds each participant produces, and the other participants’ hearing it. Imagine if Chicago guy hits a drum with a steady beat — one beat per second. Imagine if Tokyo girl hears each beat 0.1 seconds after it is played. If Tokyo girl claps in time with what she hears, her claps will occur 0.1 seconds after Chicago guy claps, but the two claps will sound in time to her. So far, so good. But when Tokyo girl’s hand claps reach Chicago guy, there is a further 0.1 second delay, and so Chicago guy hears hand claps 0.2 seconds after each of his drum beats. The gentleman that he is, Chicago guy shifts his drum beats 0.2 seconds later, to adjust to Tokyo girl. Tokyo girl then hears Chicago guy’s beat shift by 0.2 seconds, and so she politely shifts her claps to match. The result is that the tempo continuously slows as each adjusts their beat and clap to accommodate the other. Eventually the whole thing grinds to a halt or falls apart. There is a roughly 0.18 second delay between two people with good internet connections in Chicago and Tokyo. For it to be possible to have a Tokyo-Chicago jam session, the network would have to be more than 20x faster [less than 0.009 seconds of latency]. This would require a speed faster than light.

Hearing and latency

Humans can accurately perceive audio intervals as small as 4-5 ms [milliseconds]. Because of the time it takes even a staccato sound like a drum hit to evolve and decay, two sounds less than 15 ms apart are generally perceived as continuous rather than separate. But intervals between 5 ms and 15 ms are an important part of the feel of music — the pushing or dragging against the tempo.

Latency of physical sound

Performers spaced 2 meters apart will experience a natural 6 ms latency from the time it takes sound to travel through air. At 3 meters, that latency is 9 ms, and at 4 meters (13 feet), it is about 12 ms. Less than 9 ms of latency is ideal, and greater 12 ms becomes problematic for timing. Such transmission speed is not likely to ever be possible between continents.

Latency of fiber-optic communication

Signals travel on fiber optic line at about 1km per 5 micro-seconds (about 2/3 the speed of light in vacuum). To reach the furthest part of the world on the most direct route (20,000 km) would therefore take about 100 ms. Put differently, every 1 meter of natural sonic latency (about 3ms) is equal to nearly 588 kilometers of theoretical fiber-optic latency. The theoretical limit to online real-time collaboration is therefore about 2000 km. That won’t cross the oceans, but seems at least like a good start. In practice, however, our network latency is much higher. The signal between two users is not a straight fiber optic line, but rather zig-zags and passes through a multitude of networking devices, each introducing more latency along the way.

Online real-time music collaboration is not possible

As of 2013, it is difficult to have online real-time musical collaboration even within the same city. To do so requires setting up a class of specialized high-speed network similar to what is used by the financial world’s high-speed trading outfits. The latency between the laptop on which I am typing in a New York City apartment and a local New York City domain name server is currently 12ms — low, but not good enough for music.

Real time online musical collaboration has been a dream among musicians since the advent of the internet. But it is constrained by the same physical barrier as interstellar travel: the speed of light. We will colonize the stars sooner than play music together across continents.

2020 update: There are a number of apps in the works that aim to solve the problem of real-time jamming. These apps use a centralized metronome to keep participants in sync. Latency is less of an issue with melodic content. If player A hear’s player B’s melody 1/8th or 1/16th note late, it’s still close enough to react melodically. The bigger issue is with tempo. A centralized metronome can solve for this — much as a conductor keeps the wings of an orchestra in time, but now imagine that the wings of the orchestra are 40 meters apart — roughly equivalent to 120ms of network latency. But the adherence to a metronome removes an important aspect of the music making — the ‘feel’ and control over tempo.

Network Latency Test

Below is a list of the latency I measured between a high speed cable home connection in New York, and name servers in various global locations. I tested with both wifi connected laptop and hard-wired PC. The results were the same.

New York: 12 ms
Boston: 16 ms
San Francisco: 85 ms
Santo Domingo, Dominican Republic: 63 ms
Paris: 93 ms
Tokyo: 189 ms