A couple of months ago I stumbled upon a cute web app that lets you log in with your Spotify account and see some interesting attributes of the songs in your playlists. As a long-time music fan and early adopter, I use Spotify probably since day-one by religiously saving interesting songs to my Favorites playlist.
This tool was built by Echonest, a small startup that Spotify bought two years ago for $100M and has contributed to the success of automated music recommendation, such as the Discover Weekly playlists.
By getting all these interesting song attributes like Danceability, Energy, Acousticness and so on, I got the idea to do a statistical analysis of my favourite songs.
I would describe my music taste as quite open, but speaking in terms of genres, I am probably into what is called indie rock or art rock, with an occasional influence of folk, chamber and baroque. However, I’d appreciate any piece of music with narrative lyrics, lush orchestrations and dynamic melody progressions. My Favorites playlist contains 321 songs covering a wide spectrum of music genres. So, I was quite curious what this number crunching would reveal about my preferences.
Extracting interesting attributes from songs
The attributes, or features in machine learning terminology, are the variables that our mathematical models take into consideration when they try to predict stuff. Spotify provides the following attributes for each song:
Beats Per Minute (BPM) — The tempo of the song.
Energy — The energy of a song, the higher the value, the more energetic.
Danceability — The higher the value, the easier it is to dance to this song.
Loudness — The higher the value, the louder the song (in dB).
Valence — The higher the value, the more positive mood for the song.
Length — The duration of the song.
Acousticness — The higher the value the more acoustic the song is.
Release Year — The year each song was released.
Popularity — The higher the value the more popular the song is.
These attributes describe mainly sound characteristics. This type of approach is somewhat different from the established method of collaboration filtering which is based on common songs in users’ playlists. Recent advances in deep learning allow us to analyze multimedia such as songs, in unprecedented detail. The above attributes are probably extracted from a Convolutional Neural Network; there is a fascinating post by a Spotify engineer describing this process extensively.
Enough talking, just show me the graphs!
⬆️ loud, ⬇️ pop
The histogram distributions plot a descriptive overview of the average song I’d like. I tend to listen to music with a tempo of around 120 BPM, a little lower than some common music genres. I admit that I am not into popular stuff and that reflects on the bump around zero in Popularity.
Energy and Danceability are not my strongest points, either. Amplifiers and guitars seem to contribute to high Loudness and low Acousticness, as well. Positive mood is reasonably low considering my affection for minor keys.
Average song duration is 263 seconds or 4:20 minutes, somewhat longer than the average radio song, probably due to a few outliers (“The Past Is A Grotesque Animal” — 11' 55'’, “Piss Crowns Are Trebled”— 13' 50'’).
Today’s Top Hits are different
In an attempt to get out of my music bubble, I compared my Favorites with Today’s Top Hits, the most followed playlist on Spotify with 9 million followers updated every week with 50 new songs.
Naturally, today’s top hits are recently released, so the high concentration around this year is expected (note the logarithmic y-axis). Lower tempo is possibly attributed to more summer-ish, chillout songs. Energy, Loudness, Danceability and Positive mood are considerably higher, due to pop-r&b. Songs are shorter and more concentrated around radio acceptable durations.
Energetic songs are loud, positive and not acoustic
Next step in our analysis is a correlation between attributes. Some significant correlations are presented below. It seems that from my Favorites, energetic songs tend to be louder, more positive and less acoustic.
Songs get louder over time while energy predicts popularity
Although the correlations are not so significant as above, there seems to be a link of popularity and song energy as well as the already proven fact that songs get louder over time.
Projecting songs in (feature) space
Generally, we tend to prefer datasets with many different variables in order to ensure that our models are strong. But that comes with a drawback: it is not easy to visualize stuff. Apart from pairwise correlations, we can apply a machine learning algorithm for dimensionality reduction to project our 9 variables in the two-dimensional space, namely x-y.
I picked the t-SNE algorithm which has been used in a wide range of applications, including computer security research, music analysis, cancer research, bioinformatics, biomedical signal processing and so on.
In layman terms, it models each song by a two-dimensional point (x,y) in such a way that similar songs are modelled by nearby songs and dissimilar songs are modelled by distant songs. The figure below is the resulting projection.
Understandably, this huge blob of points and text is not the most beautiful graph but we can spot some similar songs and outliers.
If we delete all text, leave just the points and display only songs from a specific artist, we can observe how consistent and similar are songs written by the same person. As a proof of concept, I plotted songs of two of my favourite bands, Belle & Sebastian and Arcade Fire.
Belle & Sebastian songs are always inventive and quite difficult to categorize in a genre. Through the years, they have experimented with different sounds such as indie folk in the late 90s, chamber rock in early 00s and even dance grooves lately. At the projection we see “Stay Loose” and “Your Cover’s Blown” near each other, being their two most structurally unconventional songs.
Content-agnostic recommendation is dead
Traditionally, services like Amazon or Spotify have relied mostly on collaborative filtering approaches to power their recommendations. For example, if two users buy the same set of items, their tastes are probably similar.
However, this approach was content-agnostic. As a result, popular items will be much easier to recommend than unpopular items, as there is more usage data available, leading to homogeneity and boring suggestions. We need another way to recommend new and unpopular items (songs), the cold-start problem in a nutshell.
By extracting the above sound features we could solve this problem in part. My Favorites playlist does not contain ground truth data such as like / don’t like. The whole dataset shows intent. In machine learning terms, I have only one class, positive.
I modelled this problem as one-class learning for novelty detection. But how can you make a classification from just one class? Imagine a factory and a controlling system that determines when something goes wrong. While it is relatively easy to gather data of situations that are OK, data of a faulty system state can be rather expensive, or even impossible. By just providing the normal data, an algorithm creates a representational model. Newly encountered data (e.g. new songs) are compared with this model.
I reduced the dimensions as above with a more robust technique, Principal Components Analysis (PCA), and I picked the one-class Support Vector Machine (SVM) classifier. By tuning the parameters we can find an optimal space in which we can make predictions. The contour plot below was inspired by a QZ article, in which the author gets a personalized overview of his music taste.
For instance, if Spotify was to suggest new songs to me, it would compare if it could fit them inside the orange-red area. This area represents my taste. A few outliers on the right are some unconventional songs in structure or duration, such as classical music.
At the moment artificial intelligence research is mostly about getting computers to understand things that humans already do: images, sounds, text and so on. But the focus is going to shift to getting computers to understand things that humans don’t, due to complexity or sheer scale. This provides us with the opportunity to analyze our habits, likes and preferences in order to learn more about ourselves. Why Do I Like This? Now Backed By Science!
In the era of big data, there is a new emerging trend of employing small data with features extracted from big data. For example, my dataset might be quite small (n=321), but the sound features were extracted by deep neural networks trained in millions of songs. This new paradigm could democratize the currently hardware-focused system towards more products powered by artificial intelligence.
Attribute values were already normalized to [0, 100] except for Loudness which was in decibels, typically between -60 and 0 dB. Upon writing this, I found out that Spotify API provides additionally: song key, instrumentalness, liveness, modality, speechiness and time signature. Future work material.
The above analysis was done in Python, using the sci-toolkit libraries: Sklearn for machine learning, Pandas for data pre-processing, Seaborn and Matplot for visualization and Numpy for numerical computations. Code, data and Jupyter notebooks are available on Github.
UPDATE (Dec 2017): Jerry Yu, a student in Carnegie Mellon, implemented some ideas from my post and made a web-app where you can visualize your own playlist online.
UPDATE (Nov 2018): Elan Parsons, a data scientist from Pennsylvania reproduced my analysis to create her own music maps.
Dimitris Spathis is a researcher and data scientist doing a PhD in Computer Science at the University of Cambridge.
If you enjoyed reading this, please click the ♥ below. This will help to share the story with others.