Visualizing Hundreds of My Favorite Songs on Spotify

A tale of statistics, personal taste, and beats per minute

Dimitris Spathis
Aug 8, 2016 · 8 min read
Pieter Claesz (1623, Louvre)

A couple of months ago I stumbled upon a cute web app that lets you log in with your Spotify account and see some interesting attributes of the songs in your playlists. As a long-time music fan and early adopter, I use Spotify probably since day-one by religiously saving interesting songs to my Favorites playlist.

This tool was built by Echonest, a small startup that Spotify bought two years ago for $100M and has contributed to the success of automated music recommendation, such as the Discover Weekly playlists.

By getting all these interesting song attributes like Danceability, Energy, Acousticness and so on, I got the idea to do a statistical analysis of my favourite songs.

I would describe my music taste as quite open, but speaking in terms of genres, I am probably into what is called indie rock or art rock, with an occasional influence of folk, chamber and baroque. However, I’d appreciate any piece of music with narrative lyrics, lush orchestrations and dynamic melody progressions. My Favorites playlist contains 321 songs covering a wide spectrum of music genres. So, I was quite curious what this number crunching would reveal about my preferences.

Extracting interesting attributes from songs

Beats Per Minute (BPM) — The tempo of the song.
Energy — The energy of a song, the higher the value, the more energetic.
Danceability — The higher the value, the easier it is to dance to this song.
Loudness — The higher the value, the louder the song (in dB).
Valence — The higher the value, the more positive mood for the song.
Length — The duration of the song.
Acousticness — The higher the value the more acoustic the song is.
Release Year — The year each song was released.
Popularity — The higher the value the more popular the song is.

These attributes describe mainly sound characteristics. This type of approach is somewhat different from the established method of collaboration filtering which is based on common songs in users’ playlists. Recent advances in deep learning allow us to analyze multimedia such as songs, in unprecedented detail. The above attributes are probably extracted from a Convolutional Neural Network; there is a fascinating post by a Spotify engineer describing this process extensively.

Enough talking, just show me the graphs!

⬆️ loud, ⬇️ pop

Energy and Danceability are not my strongest points, either. Amplifiers and guitars seem to contribute to high Loudness and low Acousticness, as well. Positive mood is reasonably low considering my affection for minor keys.

Year distribution ranges from the ’50s (Tom Lehrer!) to today. The average year is 2004 when some of my favourite records came out, such as Funeral by Arcade Fire and Cherry Tree by National.

Average song duration is 263 seconds or 4:20 minutes, somewhat longer than the average radio song, probably due to a few outliers (“The Past Is A Grotesque Animal” — 11' 55'’, “Piss Crowns Are Trebled”— 13' 50'’).

Today’s Top Hits are different

Naturally, today’s top hits are recently released, so the high concentration around this year is expected (note the logarithmic y-axis). Lower tempo is possibly attributed to more summer-ish, chillout songs. Energy, Loudness, Danceability and Positive mood are considerably higher, due to pop-r&b. Songs are shorter and more concentrated around radio acceptable durations.

Energetic songs are loud, positive and not acoustic

Songs get louder over time while energy predicts popularity

Projecting songs in (feature) space

I picked the t-SNE algorithm which has been used in a wide range of applications, including computer security research, music analysis, cancer research, bioinformatics, biomedical signal processing and so on.

In layman terms, it models each song by a two-dimensional point (x,y) in such a way that similar songs are modelled by nearby songs and dissimilar songs are modelled by distant songs. The figure below is the resulting projection.

Understandably, this huge blob of points and text is not the most beautiful graph but we can spot some similar songs and outliers.

If we delete all text, leave just the points and display only songs from a specific artist, we can observe how consistent and similar are songs written by the same person. As a proof of concept, I plotted songs of two of my favourite bands, Belle & Sebastian and Arcade Fire.

While most of Arcade Fire’s songs are grouped down right, there are some outliers. “Rebellion,” being their most popular song, stands quite far to the left. “Intervention,” with its swelling orchestration, stands apart as well.

Belle & Sebastian songs are always inventive and quite difficult to categorize in a genre. Through the years, they have experimented with different sounds such as indie folk in the late 90s, chamber rock in early 00s and even dance grooves lately. At the projection we see “Stay Loose” and “Your Cover’s Blown” near each other, being their two most structurally unconventional songs.

Content-agnostic recommendation is dead

However, this approach was content-agnostic. As a result, popular items will be much easier to recommend than unpopular items, as there is more usage data available, leading to homogeneity and boring suggestions. We need another way to recommend new and unpopular items (songs), the cold-start problem in a nutshell.

By extracting the above sound features we could solve this problem in part. My Favorites playlist does not contain ground truth data such as like / don’t like. The whole dataset shows intent. In machine learning terms, I have only one class, positive.

I modelled this problem as one-class learning for novelty detection. But how can you make a classification from just one class? Imagine a factory and a controlling system that determines when something goes wrong. While it is relatively easy to gather data of situations that are OK, data of a faulty system state can be rather expensive, or even impossible. By just providing the normal data, an algorithm creates a representational model. Newly encountered data (e.g. new songs) are compared with this model.

I reduced the dimensions as above with a more robust technique, Principal Components Analysis (PCA), and I picked the one-class Support Vector Machine (SVM) classifier. By tuning the parameters we can find an optimal space in which we can make predictions. The contour plot below was inspired by a QZ article, in which the author gets a personalized overview of his music taste.

For instance, if Spotify was to suggest new songs to me, it would compare if it could fit them inside the orange-red area. This area represents my taste. A few outliers on the right are some unconventional songs in structure or duration, such as classical music.


In the era of big data, there is a new emerging trend of employing small data with features extracted from big data. For example, my dataset might be quite small (n=321), but the sound features were extracted by deep neural networks trained in millions of songs. This new paradigm could democratize the currently hardware-focused system towards more products powered by artificial intelligence.


The above analysis was done in Python, using the sci-toolkit libraries: Sklearn for machine learning, Pandas for data pre-processing, Seaborn and Matplot for visualization and Numpy for numerical computations. Code, data and Jupyter notebooks are available on Github.

UPDATE (Dec 2017): Jerry Yu, a student in Carnegie Mellon, implemented some ideas from my post and made a web-app where you can visualize your own playlist online.

UPDATE (Nov 2018): Elan Parsons, a data scientist from Pennsylvania reproduced my analysis to create her own music maps.

Dimitris Spathis is a researcher and data scientist doing a PhD in Computer Science at the University of Cambridge.

If you enjoyed reading this, please click the below. This will help to share the story with others.

Follow Cuepoint: Twitter |Facebook


Medium’s Premier Music Publication: An ear for the new, a…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store