My 2016 in music visualized (and how I did it)

I visualized the what, who and when of all the music I played on Spotify in 2016 and made it an interactive player for you to discover new music.

Music is something I just can’t live without. No matter if it’s actively listening to and engaging with it, or simply using it as a background filler. So summarizing my past year in music is something that gives me much joy, and fun insights; it’s e.g. evident that The Avalanches came out of nowhere in the summer to claim the top-spot, while runner up Fear of Tigers have been played more consistently (even if his new album only came out in the summer).

Although the visualization doesn’t provide you with the story behind why these artists and songs made it into my life in 2016, it’s for me personally very much a trip down memory lane — and perhaps it can serve as a way for you to explore new music?

Go ahead and have a try at www.ralfelfving.se/2016/music, and for those of you curious about the details and how I did it there’s some more info below.

What is this?

I decided to dust-off a visualization I did last year, extend it with new functionality and load new data into it. The finished item presents: (almost) all artists I’ve played on Spotify throughout 2016 clustered per genre; which the most popular track of that artist is; and a circular donut-based calendar showing which months, weeks, days and even hours I’ve played any given artist. And for most artist there’s also a Spotify widget that allows you to stream my most-played track of that artist.

It’s the data, stupid.

In order to do a meaningful visualization you need data. Unfortunately, but understandably, Spotify does not provide an API or other means to access your own play history. Luckily, they do provide music lovers the possibility to hook up their Spotify clients to Last.FM (audioscrobble). And luckily, I’ve been scrobbling since 2007 which provides me with almost 10 years of data — obviously including 2016.

In order to extract this data I signed up for an API account with Last.FM and used Python to cycle through my play history, and stored the data of artist names, titles, timestamps as well as genres (when available) in a local Postgresql database. This forms the foundation of my dataset, and allows me to run some simply queries to figure out some fun stats — like below where you can see some month-by-month stats. There are early patterns that are easy to spot here, like the massive spike of The Avalanches in the summer as they returned with their second album after 16 years of silence; almost every other song I played in July was an Avalanches song.

But in order to make the visualization link to Spotify via the Spotify web widget I also needed Spotify URI:s for songs. Although LastFM will provide a play-button for a lot of songs in your play history on their website they don’t provide that link (URI) in their API, so I hooked up to the Spotify API and used the artist and track title to search for the tracks and extract the URI. I could easily have scraped the LastFM website, but opted for the Spotify API. It works for most artists and tracks, but not all and I won’t go into the details of why — but some manual tinkering helps when the title that was logged on LastFM doesn’t correspond to what Spotify has on record.

In order to group/segment artists I need to have their genre, or a relevant genre. You don’t get that from Spotify, but you can get it from LastFM. Fun fact, it’s largely based on crowd-sourced data — i.e. less than perfect. Each artist has up to five genres attached to it, weighted according to relevance (or frequency?). Anyway, with a little bit of postgres trickery I try to find the most relevant by cleaning out things I consider garbage — actual examples include ‘seen live’, ‘spotify’, ‘under 2000 listeners’, ‘you are welcome in poland’ and ‘artists i have seen live’ — and assign a new genre in the order provided by LastFM. To be honest, I also filter out e.g. ‘electronica’ to avoid 75% of the artists to end up in the same genre. A few select small artists that don’t have a serious genre available are excluded, although I could have manually added genres for them.

Great, now I have all the data I need. Artist names, track names, play history, genres, Spotify URIs etc. Let’s get to work.

The structure of everything

Anyone that has worked with D3.js knows that a key word is “data structure” (ok, technically that’s two words...). So I got working writing postgresql queries to extract the relevant data for the visualization. Fortunately, this is a process I find extremely fun.

I segmented the data into several data sets; one for the bubble plot, and one per frequency for the calendar (month, hour, day, hours). I could have have gone for one slightly bloated high-frequency set and “filtered up” towards the lower frequency data sets but I was lazy and figured that file size won’t be an issue. I however tried to minimize data size by assigning integer IDs per artist to link the bubble plot data to the calendar data, and such avoided to link on artist name (which is much longer). All of the datasets were then exported as JSON-files.

I also for some time considered segmenting the calendar data into a file each per artists that gets loaded only if you interact with that artist, but concluded that it would mean the page would need to do a lot of http-requests and the server disk-reads. So I opted to just load everything when you load the page, and I even took the shortcut to keep the (static) data inside the actual file rather than load the json-files separately.

You don’t have to be a viz to viz

As mentioned previously I re-used much of the d3-framework I wrote last year, but I want to highlight that much of the force layout bubble plot has been repurposed from one of Mike Bostock’s Blocks since I had no experience of force layout at the time. I also decided to stay with v3 of D3, rather than refactor my code to use v4.

As I had decided to extend the old version with the calendar-view I went about creating it. I already had a base-line from an old calendar visualization, so I copy/pasted much and adopted it to my current project and made some adjustments — including adding a hourly view in the centre of the donut charts. It’s easily triggered from mouse-over actions, and the bubble plot data is linked to the calendar data using integer IDs per artist as described above, so it’s easy to filter down the calendar data set to get only the relevant artist’s data.

I realized that the calendar-view and the colors that represent frequency of plays will not be comprehensible to everyone, which greatly hampers any visualization. So I added a segment below to try to visually explain that they’re months, weeks, days and hours, as well as what the colors represent. It’s still confusing to some, and I concede that this new segment needs some refinement — once I got going I perhaps became more married to the type of calendar-view, than to what and why I tried to present. So I consider changing to a more “recognizable” calendar-representation that almost anyone would immediately relate to. (feedback and ideas are welcome!)

Adding the Spotify widget and triggering it is easy. Spotify provides documentation, and then I just load the widget with the URI fed by the data for the relevant artist. In order to “lock” the widget I made it so that the user has to click the desired artist, and to change you “unlock” the artist by clicking the circle/bubble again.

Disclaimer: anyone that dig into the code will see that my coding approach is not elegance, but functionality; it’s not pretty or efficient, but it works.

This is all history, but what’s next?

Well. I hope to get some feedback from you on potential improvements, and as mentioned one already considered is making a better calendar visualization. Another is to not just provide the most-played track of a given artist to be played, but actually load a playlist of tracks ordered by play-frequency. That would make the visualization contain (almost) all tracks I’ve played any given year.

I’ve also toyed with the idea of making an automated visualization of my listening habits, that at the end of the day collects and updates the most recent trends in my life — and provides me with information if I’ve started listening to new artists, or revisited an artist for the first time in a very long time, or suggest a track I haven’t listened to in a long time.

What would you do if you had access to all your play history?

Edit history:

  • 2017–01–31: Added reference to source of d3.js force layout code used since I was unable to locate it at the time I published this post.
Show your support

Clapping shows how much you appreciated Ralf Elfving’s story.