Spotify’s Discover Weekly explained — Breaking from your music bubble or, maybe not?

Valerio Velardo
Feb 11, 2019 · 9 min read

The three algorithms behind Discover Weekly, and the similarity/diversity problem.

Photo by sgcreative on Unsplash

With over 200 million monthly active users, the streaming behemoth Spotify has transformed the way we listen to music forever. Merely opening the app gives you access to more than 40 million tracks with the click of a button. If you never stopped listening, it would take you more than two lifetimes to go through the entire Spotify database in one go.

Despite this incredible amount of freely available music, users often prefer repeatedly listening to a small selection of their favourites. Some people get stuck in a specific genre or a handful of artists, never to venture into the musical unknown. For these users, as well as for the musically adventurous among us, Spotify delivers a compelling solution called Discover Weekly.

Every Monday, Discover Weekly gifts 200 million Spotify users with a playlist of thirty songs they’ve never heard before. If you’re at least as old as I am, you may remember that music-passionate friend who burned you personalised CDs, encouraging you to ‘Take it! You’ll love the perfect balance of Flemish counterpoint music’ (Yes, that’s a thing). Discover Weekly has come to be that friend, just in silico. What’s great about this service is that, like your childhood friend, it knows your musical taste so well that it can make fairly accurate guesses about what you may like. But how does this seemingly psychic magic happen?

Enter the world of recommendation systems. A recommendation system is an algorithm which tries to predict the rating or preference a user would give to an item, like a song or movie. Leveraging this information, the system suggests a number of further items the user may enjoy. Recommendation systems are used across many types of media. The last time you watched Netflix or bought a book on Amazon, you may recall being gently offered to watch other movies or buy other books. The basic technology that powers recommendation systems is largely the same, regardless of the specific domain of application (like movies, books or films).

How Spotify’s Discover Weekly works

Spotify’s Discover Weekly recommendation model isn’t a revolutionary one. Instead, it’s a combination of a number of effective recommendation techniques previously used by other industry players. This has led to a uniquely powerful music recommendation engine, mainly based on three recommendation models:

  1. Collaborative filtering consists of collecting and analysing users’ behaviours;
  2. Content-based filtering looks at the descriptions of songs and artists;
  3. Audio features are extracted from the raw audio through machine learning.
The Spotify music recommendation framework

Let’s delve into these three models in more detail.

Collaborative filtering

This is what Spotify has traditionally relied upon. With collaborative filtering, recommendations are outsourced to the users. Listening behaviours are analysed and used as a way to predict users’ preferences. The underlying idea is that people who listen to similar music likely have similar musical tastes. Conversely, if the same group of like-minded people listens to two different songs, they’re probably similar. This information can be leveraged to suggest songs you’ve never heard before.

But how does Spotify implement this intuition in their algorithms? The answer is matrices — mathematical tools coming from the arcane field of linear algebra.

A matrix in action.

Each row of this matrix represents a user; each column a song. In reality, this is a gigantic matrix stored in the Spotify servers, containing 200 million rows and 40 million columns. If you’re a Spotify user, you’re one in 200 million.

After applying some linear algebra magic to this matrix, Spotify determines two vectors that we’ll call X and Y. X is a user vector, representing one single user’s taste. Y is a song vector, representing one single song’s profile. That means there are (you guessed it) 200 million user vectors and 40 million song vectors.

So for John Doe to receive new music recommendations, the algorithm identifies users with a musical taste similar to John’s. That means his user vector is compared against 200 million other user vectors, creating a group of like-minded people with similar music taste. That means recommending a song to John is as simple as choosing a track that one of the people in the group has heard, but John hasn’t.

Although collaborative filtering is very effective, it has some drawbacks. This approach doesn’t use any type of information about the recommended items, but is based solely on the consumption patterns associated with them. This implies that more popular items are easier to recommend. Conversely, unpopular items are difficult to suggest. And new items that haven’t yet been consumed are impossible to suggest. If Spotify wants to recommend a new, amazing, undiscovered artist’s work, they’re better off integrating a variety of approaches in their recommendation engine.

Content-based filtering

One such approach is content-based filtering, which compares descriptors of an item against a user profile to make recommendations. The user profile is built from the same tags that describe the content the user consumes, like ‘rock’ or ‘classical’, i.e., semantic information.

In 2014, Spotify acquired Echo Nest, a company that employed Natural Language Processing (NLP) to extract semantic information from music-related text content. Spotify constantly crawls the Internet to figure out what people think about artists and songs. News articles, blogs and online reviews are analysed to infer the adjectives and nouns frequently used to describe the music. These NLP algorithms also spot connections between different songs by looking at the language related to different artists.

Although the details of how Spotify processes this data aren’t public, we can assume that the music tech giant’s approach is similar to that of the acquired Echo Nest’s. Songs and artists are associated with a number of ‘top terms’ that describe them semantically. Each term has a weight that quantifies its relative importance for a given item. Similar to collaborative filtering, these top terms are used to find commonalities between different artists and songs. This information is then used to recommend new songs.

Despite its effectiveness, the content-based filtering approach faces a major issue: All the information derives from what people write about the music and the artists. In other words, the NLP algorithms don’t get any information from the songs themselves. This fails to account for how important the actual sound of a track is in determining whether you like a song (or don’t like it for that matter). It’s better to leverage that information through competent audio analysis, in order to achieve more well-rounded recommendations.

Audio features

(Brace yourself; we’re about to get technical)

Spotify uses convolutional neural networks to extract musical features directly from raw audio. Interestingly, convolutional networks have mainly been used with visual data. As a result, data scientists have successfully applied them to image detection. This is achieved by feeding a dataset of images to the network, pixel by pixel, to train the model. Once trained, the algorithm is capable of classifying different objects that appear in images that are new to the network. In the case of Spotify, the network has been modified to accept audio data as the input instead of pixels.

The architecture of the Spotify convolutional network.

The architecture of Spotify’s network comprises four convolutional layers (the ones on the left in the above image) and three fully-connected layers (the ones on the right). The input consists of time-frequency representations of audio frames. The audio frames go through the four convolution layers undertaking several max pooling operations, which downsample the time-frequency representation. After a global temporal pooling layer, the data eventually flows through the three dense layers.

The output of the neural network consists of an understanding of the song, which includes audio features such as key, mode, tempo, loudness and time signature.

Plot of the output of the network for 30 seconds of ‘Around the World’ by Daft Punk.

The extracted audio features provide a song’s sonic profile. Sonic profiles are compared against each other to find similarities among the songs in Spotify’s database. Ultimately, this information is used to recommend new songs to users that are sonically similar to what they like.

But what if they don’t like it. We’ve seen how a combination of collaborative filtering, content-based filtering and extraction of audio features underpins Discover Weekly’s recommendation engine. But we’ve yet to touch on the major issue with algorithmic recommendation: not everyone is satisfied with the songs recommended by Discover Weekly. There is, however, a really good explanation for it.

Similarity vs diversity

All (music) recommendation systems struggle with the same uphill task. They’re supposed to suggest music that people will like, but, at the same time, is somewhat outside of their listening bubble. In order to be effective, a discovery engine should strike a harmony between two opposite forces: similarity and diversity.

The thing is, people tend to like new music that, in degrees, is similar to what they’re used to. If a recommendation system was to consider only similarity, its job would be relatively easy. The ideal motto for such a system would be ‘Same old blues’. But after a while, people get tired of listening to infinite variations on the same theme. Meet diversity.

A recommendation engine should be bold enough to propose music that is outside our musical comfort zone; in other words, music that is diverse enough from the same old blues we’ve heard before. This is risky though. If the music suggested is too different from what I like, then I likely won’t enjoy it, lamenting the recommendation system’s disconnect with my musical taste. And if I’m in a bad mood, I might end up insulting the engineers (they’re always in the firing line).

One way or another, there are always people who aren’t satisfied with the recommendations. Some users aren’t happy with Discover Weekly, because they think Spotify’s playing it safe with its all-too-similar suggestions. At an AI music event we organised with Melodrive, I remember a panelist remarking how Discover Weekly won’t really give you new, interesting music. He fondly reminisced about the good old times, when you could wander into a vinyl shop and approach the owner — usually some sort of messianic figure who’d take you on a musical pilgrimage, guiding you into an obscure niche as mysteriously wonderful as the cover art it came with.

So how do you do you reignite that feeling or, at least, improve satisfaction? As usual, when algorithms are dealing with subjective experiences, there’s not really a one-song-fits-all solution to the issue. The simple truth is that different people have different expectations about how a music recommendation system should operate.

We can think of high similarity and high diversity as the two ends of a continuous spectrum. Some people love the familiarity of their preferred sub-genre and getting more of the same. Although it’s likely that the majority of people may have a more balanced approach, wanting to explore new music that’s still somewhat-related to what they like. If you’ve listened to Pink Floyd, there’s always Dream Theatre. A minority of adventurers are happy to freely roam between far-removed genres, as long as the musical experience is constantly fresh and rewarding.

With a variance of behaviours and expectations as large as the scope of the human experience, one possible solution is to provide as many variations of Discover Weekly’s recommendation engine as the number of Spotify users. Put another way, Spotify could ask their users to customise the discovery engine in order to respond to their recommendation needs. If I want to live in my bubble of renaissance Flemish counterpoint music, I can, by telling the algorithm to suggest highly similar music until I exhaust the Spotify catalogue. If I feel more explorative, I could move the slider towards the diversity end. This custom approach could ensure all users are happy with the music suggested. Each to their musical own.

The Sound of AI

Where AI, music and audio worlds collide.