Analysis of My Spotify Streaming History

Published in

The Startup

8 min readAug 9, 2020

Spotisis /spo-ti-sis/
noun
The analysis of one’s Spotify streaming history using Python.

I was reading through a lot of data science related guides and project ideas when I came across an article in which the author compared his song choices with his friend’s. I wanted to do something similar, so set out to analyse my own streaming history and compare it with what the world listens to.

Through this, I aim to find out more about my music preferences and how that differs from the world’s genral picks.

I never really put much thought into my music preference before this project — it was always kind of dependent on my mood, and when someone asked me what type of music I like, I had no answer — because it varied from one hour to another.

I’ve split this project into 2 sections:

Part A is the analysis of my music streaming history.

Timeline of my streaming history
Day preference
Favorite artist
Favorite songs
Spirit of the songs
Diversity

Part B is the comparison of the top 50 songs streamed on my list with the top 50 songs streamed in 2019

The data

Spotify allows every user to request a download of all their streaming history, so Part A is completely dependent on that. They also have an amazing Developer Platform in which the public can use the data available for their own interest. Along with my personal data, I used the audio features option — which breaks down a song and gives it ‘score’ for a number of different attributes. The attributes are as follows:

Acousticness — A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic
Danceability — A description of how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
Energy — Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.
Instrumentalness — Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content.
Liveness — Detects the presence of an audience in the recording.
Loudness — The overall loudness of a track in decibels (dB). Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.
Speechiness — Speechiness detects the presence of spoken words in a track.
Valence — A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track.
Tempo — The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration
Mode — Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.
Key — The estimated overall key of the track.

The dataset was a little messy, so I used Pandas to clean it up according to my need for each section. The entire code can be found on the GitHub link at the end of this article.

For Part B, I used this dataset from Kaggle.

Before we begin, I just want to say something… Don’t come at me for my music choice!

Part A

1. Timeline of my streaming history

I know that I spend a lot of time listening to music, but I didn’t know I spent that much time! The data dates back to late June of 2019 and was highly varied.

On February 24th 2020, I spent a gasping 535 minutes (which is almost 9 hours) on spotify — the most in the past year! There’s no definite answer as to why the difference between the highest and lowest value (which was in seconds) is so much, but I did register for Spotify Premium around that time, so maybe that was the reason? Push the promos harder you guys ;)

2. Day preference

Does the day of the week affect how long I spend listening to music?

I usually listen to music while walking to and back from college, so I would’ve predicted that more time would be spent during the weekdays. Sunday is chillday, so it makes sense that it was when I spent most time listening to music.

3. Favorite artists

Do I have a favorite arist?

According the the data, I actually do. There were two factors I considered: the number of times I played an artist’s song and the total amount of time I spent listening to their songs.

When looking through the data, I found that some of the songs were played only for a few seconds, so that was reducing the accuracy of the results.

The graphs below show the top 15 artists under both categories.

Lauv, Shawn Mendes, One Direction and Justin Bieber maintained the top 4 positions under both graphs, whereas the others were rearranged.

4. Which songs were played most?

Was it by the same 15 artists?

Yes, it was — Lauv took 5 of the 15 spots!

I realised that some of the top 15 artists (based on the amount of time spent listening to their songs) were on the list because of one or two songs which were repeated multiple times.

For example, Memories by Maroon 5 was the most played song (played for a total of 184 minutes). When comared to the total time spent listening to the group (430 minutes), the different was about 246 minutes. In percentage, it means that more than 40% of the time spent listening to Maroon 5 was spent only on Memories.

It’s a good song. Admit it.

5. Spirit of the song

Do I listen to positive songs?

Using the valence attribute from Spotify’s audio analysis features, I tried to find out the general spirit of the top 50 songs I listen to. The valence scale is from 0–1, with one being the most positiveness conveyed in the track.

For the sake of classification:
- low spirit = 0 ≤ valence < 0.5
- netural = 0.5≤ valence < 0.6
-high spirit = 0.6 ≤ valence ≤ 1

(I named it as ‘spirit’ because ‘positive’ and ‘negative’ didn’t feel right)

I was pretty unsure about this one and was utterly surprised by the results.

So I listen to more of low spirit songs?? That doesn’t make sense!

When I cross referenced the song names to its valence scale, I realised that this may not have been the most accurate representation. Ed Sheeran’s Photograph had a valence scale of 0.18, for which it was categorised as ‘low spirit’. Although it’s not a super high spirited song, it’s not so low either!

6. Diversity of songs

How do the audio features of the songs compare to one another?

The spirit of the song built up my curiosity to know more about how the songs varied from one another in therms of the audio features, so I compared the top 3 most played songs. I believe that my song choices are highly diverse.

Those who are familiar with these songs know just how much they vary from one another — they give such different vibes, but I needed the data to prove it.

There is A LOT of difference — most noticable in the loudness and acousticness attributes.

The next part is based off of this diversity.

Part B

Is my music too diverse? How does it fare when compared to the global top 50?

Apart from the mode, everything is different! I prefer less groovy, instrumental based songs which have lower energy levels, while the global hits suggest people lean towards fast paced, energetic songs that they can dance to.

The difference between my music’s average tempo (beats per minute) and the global average is 4 BPM. According to research, songs which have 120 BPM are considered to be fast paced songs. My preference seems to be at a little slower pace, though not by much.

Conclusion

This project was a blast to do. I thoroughly enjoyed learning more about my music preferences and comparing that to the global hits. Now that I am backed with the data, I can say that my music is highly diversified and that I do have a favourite artist — Lauv (considering the amount of time I’ve spent listening to his songs, it wouldn’t be justified to say otherwise!).

Following this article, I would like to continue by applying some machine learning knowledge to create a recommender system based on my music preferences.

Feel free to comment and view the entire code on my GitHub!

Big thanks to Vlad Gheorghe for his brilliant explanation (huge savior!)

Get Your Spotify Streaming History With Python

With delicious song features on top.

towardsdatascience.com