Creating Personalised, Sound Based Spotify Playlists Using K-Means Clustering

Andrii Grygoryshyn
The Startup
Published in
9 min readOct 6, 2019
Photo by Dan Farrell on Unsplash

Over the past 20 years the way we consume music has changed tremendously. Not that long ago you would have to go through quite some effort to get a hold of your favourite record. Now, with the rise of streaming services, we have more music at our fingertips than we will ever be able to consume. Whether this change had a positive effect on the music industry or society at large is up for debate. Having said that, as a data scientist, one interesting thing to come out of this is the amount of music data that is out there ready to be studied.

The major music streaming services provide high quality APIs. Through these virtually anyone with basic programming knowledge can access a multitude of data related to the service. The Spotify API in particular provides all of the basic data you would expect, such as listening history, playlist content and most listened to artists. However, something that may be an unexpected inclusion is the audio features of songs.

Currently 9 audio features are provided, including valence (musical positiveness conveyed by the song), speechiness (how much spoken word is in the song) and instrumentalness (how little vocals are there in a song). This opens up a whole world of possibilities, but for me, the most straightforward application of song audio features was to create a personalised playlist generator.

Maybe you are planning to go for a run and quickly need a loud energetic sounding playlist to motivate you. Or maybe you are having a quiet Sunday evening and need something a little more low key and instrumental to relax to. Whilst there are plenty of playlists on Spotify that satisfy these niches, I find myself skipping through a bulk of the songs as they are often too far out from my usual taste. So what if there was a way to efficiently arrange songs you already like into such sound consistent playlists?

K-Means Clustering

My immediate instinct was to use the K-Means algorithm for this task. K-means is a type of unsupervised machine learning algorithm that assigns data points into clusters, or groups. The data points are assigned into groups based on their features, thus uncovering underlying patterns. This technique is often used in marketing to discover underlying customer buying habits based on past purchases for example. My idea was that it could just as easily be used to group similarly sounding songs together.

A quick intuition for K-Means can be developed with a graphical representation. For this I created a playlist with 25 rap songs and 25 house songs, called “Rap and House”. On the figure below, each dot is a song, whilst the x-axis represents the speechiness score and the y-axis represents the instrumentalness score for the song. One thing to note here is that we only used two audio features so that the data could be visualised with a 2D plot, but we can easily increase this number to 3 or more audio features.

On figure 1 we see that all points are of the same colour, meaning that the data has not yet been clustered. When using K-Means we need to specify the number of clusters we want before we run the algorithm. Thus we have to decide how many meaningful groups there are in our data. To me it looks like there may be 3 major clusters. One in top left corner, one in bottom left corner and one to the right side at the bottom. So I will set the number of clusters to 3.

Figure 1: Audio Features Before Running K-Means Clustering

On figure 2 we can see two big changes. Now, the data points are split into 3 groups by colour. There is also an addition of 3 larger black dots, these are known as centroids, or cluster centres. The details of how the final centroids are calculated is out of the scope of this article, but what’s important is that a given data point is assigned to a cluster whose centroid is closest.

Keep in mind that this is definitely not the prettiest demonstration of K-Means, as a instrumentalness looks to be somewhat binomially distributed and instrumentalness and speechiness are negatively correlated, as one would expect. Despite this, I think it conveys the point of K-Means clustering.

Now that we’ve got some intuition for K-means, let’s see how it performs within Clusterfy.

Figure 2: Audio Features After Running K-Means Clustering

Clusterfy Demo

Clusterfy is the name of the app I built that utilises K-Means to create personalised playlists. Of course, to run K-means we need to have a pool of songs to cluster. Currently, Clusterfy collects this pool of songs through two distinct features.

Feature Selection Page

The “Discover New Artists” feature lets the user pick some of their favourite artists, and then songs from similar, related artists are collected and used to create a new playlist. The “Reshuffle Your Playlists” feature collects songs from user’s current playlists, and uses those to generate a new playlist. For demonstration purposes the latter feature will be shown.

Step 1: Playlist Selection

This feature lets user choose songs from not only their created playlists but also their liked songs and 50 most listened to songs. For the demo I will be using the “Rap and House” playlist once again, containing 25 rap songs and 25 house songs.

Playlist Selection Page

Step 2: Audio Features Selection

Clusterfy provides a choice of all nine available features (description of the features is available by clicking the “?” icon). The user can choose a selection of both audio features that they want to be highly present and audio features that they want to be present less in their new playlist, or alternatively just either one of these. I want to extract only the rap songs for my new playlist, so I will pick high speechiness and low instrumentalness.

Audio Feature Selection Page

Step 3: Choosing a Playlist for Upload

Upon choosing the audio features 3 playlists are generated. These are the 3 playlists that most highly correspond to user’s selection of sounds features. The details of exactly how this is decided will follow in the next section.

Playlist Upload Page

The decision to generate 3 playlists was to give the user more choice as even though the playlists are similar, there is some variation in both the content and the length. Once the user decides on the playlist they want to upload all they need to do is click the “Upload Playlist” button and name their playlist. After this the playlist is directly uploaded into their Spotify library.

Let’s see how the generated playlists look. Playlist 1, in the middle, successfully extracted 20 out of 25 rap songs from the original playlist.

Whereas playlist 3 impressively extracted all 25 rap songs, but also included 3 house songs.

Such admirable performance is frankly much more of a testament to how well Spotify scores audio features than to anything else. Nevertheless, currently there is not yet a way to leverage audio features in Spotify to directly control the sound of your playlist. This is where Clusterfy can be a handy tool. For me it proved especially useful in arranging my library of liked songs into separate playlists or discovering new music through the “Discover New Artists” feature. I should also say that there is a lot of fun to be had in playing around with different combinations of audio features and seeing what playlists they result in.

For all the readers interested in the deeper details of how the playlists are generated, I will now briefly go through parts of clustering script used by the app.

If you are not interested in the technical details you are more than welcome to play around with Clusterfy, which is available at https://clusterfy.co. The only thing to beware of is the page formatting inconsistency. This was the first time I’ve used CSS to such an extent and so the pages may looks slightly different depending on the size of your screen.

Clusterfy Clustering Script Explained

As already mentioned the script is built around the K-Means algorithm. As K-means can return slightly different results with each use, the algorithm is ran 50 times. This may seem like a large number of times at first, but an important consideration is that with K-means we the number of clusters needs to be specified a priori. Further, the optimal number of clusters really depends on the number of songs that have to be clustered. Due to this, the number of clusters was set to alternate in each iteration of the algorithm. The initial number of clusters is obtained by dividing the total number of songs by 10, with maximum of 6 clusters. Then, the number of clusters in each iteration is chosen randomly to be in and around this value.

Once the clusters have been created we need to choose which cluster we want to pick. This is done using my defined “playlist score” metric. The way the the playlist score is computed depends on the nature of the audio features selection by the user. If only maximise features are chosen then the playlist score for each cluster is the mean of chosen audio features in that cluster. If the user picks only minimise features then the playlist score is the mean of the chosen audio features in that cluster times -1. Finally, if the user picks a combination of maximise and minimise features, then the playlist score is the difference between the mean of the selected maximise features and the mean of the selected minimise features. With each iteration the cluster with the highest playlist score is saved, and after 50 iterations, 3 clusters with the highest playlist scores are chosen.

Such a metric is not perfect as it can potentially be biased towards shorter playlists, which may not be desired by the user. I have tried to remedy this somewhat by setting the minimum length of a cluster to 10 songs. Nevertheless, a future improvement may be to include some kind of playlist score penalty for shorter playlists or increase the minimum cluster length for some iterations. This would encourage more variety in playlist length. However, in my experience depending on the pool of songs the playlists can be quite large in length, reaching up to 100 songs.

If you are interested in the full code, it is available on my GitHub page or if you have any questions about the article feel free to contact me on my LinkedIn page.

--

--

Andrii Grygoryshyn
The Startup

Behavioural Data Science student at the University of Amsterdam. Enthusiastic about data and music.