Decoding Spotify Daily Mix

Using Spotipy in Python to analyze my Spotify Daily Mix

Abraham Setiawan
CodeX
9 min readJul 12, 2022

--

Photo by Wes Hicks on Unsplash

I listen to music almost everyday, with Spotify as my main listening platform. I listen to music when I study, I listen to music when I sleep, and I’m actually listening to music while I’m writing this article. I also listen to music for the sake of listening to music, to sing along, and to play along on a bass or a guitar.

Since I listen to music so much, I thought it might be interesting to analyze my listening behavior.

You can say that the way I listen to music can be split into two categories: Active Listening and Background Music.

Author’s two music categories: Active Listening and Background Music (thumbnails from Spotify)

My initial idea was to fetch the 1000 latest songs I listened to and analyze it. To do so, I used a Python library called Spotipy to help me with my quest.

Spotipy is a Python library that acts as a bridge between the application on Jupyter Notebook and Spotify API. In short, Spotipy makes fetching the data more Python-friendly. You can read the full documentation here.

Setting Up Spotify API

To start working with Spotify API, first we need to create an account on Spotify for Developers page and then create an app.

How the Spotify for Developers Dashboard should look after creating an app (Image by author)

After we create the app, it’s important to set up a callback URI to make the API work. To set up a callback, go to your app, then click on Edit Settings, and fill in a URI under the Redirect URIs. If you don’t have a predetermined URI, you can pretty much put in anything there. I simply put http://localhost:8877/callback/ and it works. If that doesn’t work, try putting in another port that is not reserved for another application (localhost:8888 is normally reserved for Jupyter Lab).

Setting up the callback (Image by author)

Setting Up Spotipy

Now that we have completed the API setup, we can move on to Jupyter Lab. First, we import the libraries we need.

If you don’t have Spotipy library installed, you can use pip to install it.

Then we set up the credentials. For this, we need the client id, client secret, and redirect URI. The client id and secret can be found on the app that we created on Spotify for Developer page, and the redirect URI is the one we set up earlier. In my case it’s http://localhost:8877/callback/. I use os.environ here to improve security as well as making the authentication process automated.

Spotipy has two authentication methods: SpotifyClientCredentials and SpotifyOAuth. Using SpotifyClientCredentials, we can fetch information from Spotify data that is not linked to a user, such as artists and albums. This method doesn’t require a Spotify login using the redirect URI since it’s general data. On the other hand, SpotifyOAuth allows us to get information related to a specific user, such as saved tracks or recently played songs. We need to specify a scope for this method, and it will prompt a login page using the redirect URI. Both methods work seamlessly with the os.environ step above, meaning that we don’t need to specify the credentials again since they will fetch it automatically from the os.environ.

Attempting to Analyze Recently Played Songs

Now that we have Spotipy set up, we can go to the next step, which is fetching the tracks. Spotipy can only fetch 50 tracks at a time, but it has a next function that allows us to get more tracks. This is an example of fetching my saved tracks.

Saved Tracks list giving 540 results (Image by author)

However, I encountered a peculiar problem. I couldn’t seem to fetch more than 50 tracks for current_user_recently_played() method.

Recently Played list limited to 50 results only (Image by author)

I thought my code was incorrect so I tried tweaking it, but to no avail. I also did some research online to find if anyone also has encountered this issue. Turns out, this is a common problem that has prompted many posts on Spotify for Developers forum, including this one.

I came to a conclusion that Spotify team hasn’t implemented the next feature for the recently played tracks properly, and the only way to fetch it is to do it manually, or to create a script that runs automatically every 3 hours or so to fetch the tracks. This is a bit too time consuming in my case. Therefore, I decided to go on another direction and analyze the auto-generated Daily Mixes instead.

Analyzing Daily Mixes

Spotify auto-generate 6 Daily Mixes based on the user’s listening behavior. Each mix contains 50 tracks that are normally similar to one another. In this case, the mixes are categorized as the picture below.

Daily Mixes (thumbnails from Spotify)

The categorization seems to be split into three bigger categories. Daily Mix 1 and 2 are Background Music with more Lo-Fi vibes. Daily Mix 3 and 4 are Active Listening playlists with mostly Rock, Alt-Rock, and Christian music. For some reason, a few Indonesian Pop songs got blended into the Christian music playlist. Finally, Daily Mix 5 and 6 are Background Music with more calm and relaxed vibe.

There are three things I’m interested in analyzing on the Daily Mixes, namely: audio features, the release year of the songs, and the song counts by artist.

To start the analysis, we need to fetch the tracks first. I wrote a function fetch_daily_mix() to automate the process. First, we define the scope as user-read-recently-played and create the engine sp. Then we use sp.playlist() to fetch all the tracks from a playlist ID. It will return a json-like result, and we put the result into a dataframe.

The playlist ID could be the URL, the URI, or the ID. The URL can be fetched from Spotify app, by right clicking on the playlist, then Share Copy link to playlist. So now we call the function for all six Daily Mixes. Here’s an example of Daily Mix 1.

Fetch track information for Daily Mix 1 (Image by author)

Now we want to fetch the audio features from the tracks. I wrote the functions below to fetch it. We use the sp engine we defined above to call sp.audio_features() with the dataframe with the track information as the input.

Afterwards, we call the function for the Daily Mixes. Here’s an example of Daily Mix 1. I will explain what each feature is on the visualization below.

Daily Mix 1 with audio features (Image by author)

Now we concatenate the dataframes together and do some tidying up before visualizing the data.

Visualizing Audio Features

After doing the preliminary work, we can now visualize the audio features. I decided to use sns.violinplot() to show the comparison between each Daily Mix. I use a similar color for each Daily Mix according to the playlist cover. A reminder that Daily Mix 1 and 2 are Lo-Fi music, Daily Mix 3 and 4 are Active Listening Mix, while Daily Mix 5 and 6 are calm music.

Danceability (Image by Author)
Energy (Image by Author)
Loudness (Image by Author)
Speechiness (Image by Author)
Acousticness (Image by Author)
Instrumentalness (Image by Author)
Liveness (Image by Author)
Valence (Image by Author)
Tempo (Image by Author)

Based on the visualization above, it’s quite clear that Daily Mix 3 and 4 have different audio features compared to the rest, yet having quite similar features between each other despite having completely different genre to human ears (Christian music vs Rock / Alt-Rock music). This means that I actively listen to songs with these audio features, regardless of the genre.

Other than that, Daily Mix 1 and 2 are grouped together, with tiny variations here and there. This means that Daily Mix 1 and 2 can be played interchangeably and I probably wouldn’t even notice.

For Daily Mix 5 and 6, they are mostly grouped together as well. But in some cases such as Danceability, Valence, and Tempo, they vary quite a bit. It might be complementary to one another, but perhaps I would notice a bit if someone change the running playlist from Daily Mix 5 to 6.

Visualizing Songs Based on Release Year

This time, we want to visualize the songs based on their release year, to see the distribution. I also give different colors based on which Daily Mix the songs belong to.

Histogram of Songs per Daily Mix Category Based on Release Year (Image by author)

From the histogram above, we can see that the sheer majority of the songs are release after 2020, with 2021 having the most songs. Only Daily Mix 4 (Rock / Alt-Rock playlist) have songs released all the way until the 90’s. The sheer amount of newly released songs seems to be caused by Lo-Fi and ambient music, which have only been proliferating recently due to streaming services like Spotify.

Visualizing Song Count Per Artist

In this case, we want to visualize the songs count based on their release year, including how many unique artists there are in each Daily Mix. I am keeping the colors as like before.

Song Count by Artist of Daily Mix 1 (Image by author)
Song Count by Artist of Daily Mix 2 (Image by author)
Song Count by Artist of Daily Mix 3 (Image by author)
Song Count by Artist of Daily Mix 4 (Image by author)
Song Count by Artist of Daily Mix 5 (Image by author)
Song Count by Artist of Daily Mix 6 (Image by author)

From the visualization above, we can see that Daily Mix 3 and 4, which are Active Listening mixes, have less unique artists (<20) compared to the Background Music (>30). The Active Listening mixes also have more songs from individual artists. This might be because Active Listening is more intentional towards specific artists, whereas on Background Music, the artist is less relevant than the sound of the music.

As I pointed out earlier, there are some Indonesian Pop songs that ended up on Daily Mix 3 playlist (Christian music mix). From my human perspective, they don’t belong together, but according to Spotify, they actually do. So, it looks like the Spotify auto-generation is not perfect.

And that’s how I utilized Spotipy to analyze my own listening behavior. From this process, I encountered a problem in my initial idea and learned how to iterate and find another solution to get the insights I wanted. I also learned how Spotify curates and categorizes their library by using audio features instead of genres. Most importantly, I learned about my own listening behavior, which is a big part of my life.

If you want to get to know your own listening behavior on Spotify, you can find the full code here. Have fun analyzing Spotify data!

--

--

Abraham Setiawan
CodeX

Data Analyst student at Hyper Island with experience in product and innovation. I write about my journey in the data world. Website: abrahamsetiawan.com