Let the music play

Martin Vetterli
Digital Stories
Published in
3 min readDec 18, 2018

A last column in which we learn that each tune has its own fingerprint, and how services like Shazam use it to rapidly identify the song

Photo by Malte Wingen on Unsplash

One of the first columns that I wrote was on how to compress music into small digital files, like MP3 files, in order to send them around via email. And since this is my last column, I also want to end the series with a musical topic, and thus, with a reference to a well-known song. Unfortunately, right now I just can’t remember its name…

Luckily, I can just take out my smartphone and sing my favorite tune for a while and, presto, the name of the song will appear on my screen after just a few seconds (together with a link to buy the digital file, of course)! But how do music recognition algorithms such as Shazam, Soundhound or Bing Audio work?

It turns out that they behave like a clever librarian in a Babel of musical records. In fact, the app on your smartphone will take the incoming sound and chop it up in a sequence of short audio slices (about one second long each). Each slice is then compressed using a set of mathematical techniques that we have already mentioned in the first columns. These little audio fingerprints are then sent to the remote servers of the applications to find the right tune.

However, to work with any song, these servers must contain the fingerprints of virtually every song ever published all over the world! This is an enormous amount of data, on the order of a billion records, and one that gets bigger every day; finding a specific tune becomes like finding a needle in a haystack.

This is as if you went to a library and tried to look for a book of which you only sort of know the plot. Of course, you could go through the entire inventory and check all books, one book at a time, but that would be painfully slow! Libraries therefore use a more sophisticated and faster indexing system known as the Universal Decimal Classification (UDC). This system manages to map the subject matter of a book into a short decimal number. For example, the number 636.8 may indicate a book about cats, while 531.5 may indicate a book about gravity. This precise number can be used to find the UDC number for any given book, since numbers are way easier to look up than titles and summaries.

Music recognition algorithms use a similar principle. They manage to navigate their enormous databases by converting complex sound information into precise and searchable numbers. These numbers are then used to quickly locate a small set of candidate songs, therefore reducing the time it takes to find a match to just a few milliseconds.

By the way, an interesting application of such a rapid fingerprinting and matching technology has recently turned out useful in a completely different field: earthquake detection. The problem with earthquakes is that the quantity of data from earthquake detection systems has also become a huge “library”, which is too big to be searched rapidly. As a result, earthquakes might thus be detected too late. The above indexing approach can speed up this identification process by… ah, now I remember the song that I was looking for initially! It’s called “Let the music play”!

--

--