Organizing Spotify saved songs with Machine Learning

Like a lot of people, the majority of my music listening these days happens via Spotify. One of my favorite features is the Discover Weekly playlist, a playlist Spotify algorithmically creates for all its users that it describes as “a weekly roundup of songs we think you’ll love.” Derived through what I can only assume is a combination of machine learning and voodoo magic, every week there are always a few songs that I love in my Discover Weekly; and when I find these songs, I save them to my Spotify library.

Therein lies the problem. Unlike some people who carefully curate their saved music and file it away into meticulously crafted playlists, I just dump everything into the generic “saved” section of my Spotify. Over time this unorganized heap of music has grown to be decently large (330 songs at the time of this writing). As a result, I have not been revisiting these saved songs very much. You may be thinking, “gosh Sam, why do you bother saving songs if you don’t go back and listen to them later?” Great question, I’ve been wondering that myself, and I came to the conclusion that going through them is just too hard (in an inconsequential first-world-problems kind of way).

My musical tastes span a variety of genres, and what I want to listen to at any particular moment is heavily dependent on my mood. So I can’t just go and shuffle all my saved songs since those all come from a spattering of different genres, the playful K-Pop “Awoo” would just sound weird if followed immediately by Car Seat Headrest’s gritty “Drunk Drivers/Killer Whales.” So, if I was going to listen to my saved songs, I needed to organize them by mood first. Not manually though, I’ve not got the patience for that.

Fortunately, grouping a bunch of “stuff” (songs in this case) into similar groups without much human input is a well-researched type of machine learning problem known as clustering. So I, a machine learning luddite, can piggyback off of this research without really knowing what I’m doing, hooray!


The clustering technique I opted to use is called k-means clustering, primarily because it’s one of the few clustering algorithms that I understand. To use k-means you have to take the stuff you’re clustering and somehow turn this stuff into points. You can imagine these points are in the familiar 2-d cartesian plane, but this also works for things in spaces of higher dimension. Once all your stuff has been turned into points you get to decide how many different clusters you want, k. K-means then works to find the k centroids (a fancy name for more points) such that each of your “stuff-points” is close to one of these centroids. These centroids define the k different clusters, and each of the points belongs in the cluster defined by its nearest centroid.

Taken from Mathworks

For example, in the above image all the colored dots represent are the things being clustered, the k chosen is 5, and the centroids found by the algorithm are given by the five plus symbols. The points are colored according to the clusters they’ve been assigned to, and you can see how the color is simply determined by the closest plus sign.

So, with the specific clustering technique chosen, I needed to find a way to turn songs into points. Fortunately, Spotify provides some great information on all of its songs via an API through what it calls “Audio Features.” This means that anyone with enough programming gumption can ask Spotify how it rates a given song’s danceability, instrumentalness, or a host of other features (13 in total), each on a numeric scale. In other words, Spotify already did all the hard work for me! I just needed to write the correct code and all of my saved songs would be transformed into points that I could do some clustering on.

Here’s the Python2 script I used to both fetch my Spotify saved songs and transform them into 13-dimensional points based on audio features:

Once all my songs were turned into points, I applied scaling to each of the different dimensions. As alluded to above, k-means works by trying to get all of your points “close” to a centroid. This closeness is measured by the euclidian distance (AKA straight-line distance) between points and the centroid. However, not all of the different metrics from Spotify are the same units. For example, the acousticness of a song is a number given between 0 and 1, while the duration is given as the length in milliseconds.

Suppose we have two centroids: centroid A with acousticness of 1 and duration of 150,000 and centroid B with acousticness 0 and duration of 180,000; now suppose we wanted to classify a song with acousticness 1 and duration 180,000, how would it be classified? Without scaling, it would be classified into the group defined by B because it’s only a distance 1 away versus the distance of ~30,000 from centroid A. But that’s not really what we want, two songs could sound extremely similar but be classified into different clusters only because lengths vary by much larger quantities than acousticness.

This is why I chose to apply a scaling to each of the dimensions. Although duration may indicate a musical similarity/difference between songs, I don’t want it to be weighted any more than the other audio features Spotify provides. Scaling combats this by stretching and shifting each of the dimensions so that the standard deviation of each dimension is scaled to 1. As a result, “far” in one dimension will carry just as much weight as “far” in another dimension. In the given example, centroids A and B might be scaled to (1, 0.5) and (0, 0.6) respectively, and the song would be correctly classified with group A.

Code used to scale my Spotify tracks.

After scaling my data, I needed to pick exactly how many playlists to cluster them into (the k of k-means). Typically this is done via the elbow method, where you plot the model score against the number of clusters used, and pick a k at the “elbow” of the resulting curve. This is the point where you don’t get much benefit from adding another cluster. For k ranging from 1 to 20, my elbow curve looks like:

Choosing an “elbow” isn’t an exact science, but I felt that 8 was a good number of clusters to choose here since the curve seems to flatten out a bit at k = 8 or so. Again, this isn’t an exact science so your interpretation of the “elbow” here may be different from mine.

Finally, after turning songs into points, scaling those points, and choosing a k, I got to actually run the clustering algorithm and make some playlists. Here’s the code to do that:

And that’s it! After some toying around with Python, Spotify’s API, and some Machine Learning I was able to automatically sort all my saved songs into 8 playlists. With the clustering done, all that was left was to see how it turned out! First, I took a look at what kind of playlists the machine learning algorithm thinks it created. Below are the positions of the cluster centroids along with my English interpretations and their Spotify URLs:

Sad, long, and not dance-friendly
https://open.spotify.com/user/busta_sam/playlist/6ZJJODF1pjqb38ZKqlTTZs
Cluster 0 Notable features:
danceability, -0.95694986003
duration_ms, 0.544439790449
valence, -0.840971014255
Happy songs!
https://open.spotify.com/user/busta_sam/playlist/0YMxnPxCG1cZJ5wzHnYrFi
Cluster 1 Notable features:
mode, 0.941123948114
valence, 0.530923079802
Happy, fast, but not very dance-friendly
https://open.spotify.com/user/busta_sam/playlist/6BCTnirU2CmuwJEprtZiJd
Cluster 2 Notable features:
danceability, -0.573985772103
mode, 0.607176740719
tempo, 0.778386767955
time_signature, -3.83668296263
Spoken and acoustic
https://open.spotify.com/user/busta_sam/playlist/7rFIvmvwgiWAknACRfO8Ac
Cluster 3 Notable features:
acousticness, 0.669727246281
speechiness, 1.99285951664
Happily danceable (in a minor key)
https://open.spotify.com/user/busta_sam/playlist/2cxDneFEaMmHVSDOj8eapw
Cluster 4 Notable features:
danceability, 0.576201357609
mode, -1.06255929626
valence, 0.637916077744
PUMP UP THE JAMS (loud and energetic)
https://open.spotify.com/user/busta_sam/playlist/0hZ9AB5AqoYkyGa0ruMsJj
Cluster 5 Notable features:
energy, 0.597173537605
key, 0.566320654055
liveness, 3.01315791229
loudness, 0.52168852663
Brooding and acoustic
https://open.spotify.com/user/busta_sam/playlist/57tJKMxhO82CXHZGmQ0v0s
Cluster 6 Notable features:
acousticness, 1.67149563594
energy, -1.43324770625
loudness, -1.68301335212
tempo, -0.600891338119
valence, -0.559252321609
Brooding without the lyrics
https://open.spotify.com/user/busta_sam/playlist/0xiUTbk9RyG9TfbwR3pMDo
Cluster 7 Notable features:
instrumentalness, 3.29017349386
loudness, -0.615010549014
valence, -0.840608500109

So, in the end, did this adventure into data science world work? In one sense, not really. Although I can sort of see what kinds of moods the clusters are trying to emulate, I don’t really think that the playlists created by this method rival those that people create by hand. However, this wasn’t a complete failure. These playlists are certainly better than the one giant heap of songs I was using earlier. In addition, I’ve been listening to these playlists a lot to try and evaluate them, and rediscovering some awesome music in the process. Which, at the end of the day, was the ultimate goal of this project anyway.