Using Data To Curate Your Spotify Playlist

Spotify is a great platform that allows people to pay a small fee each month and in return they have access to millions of songs across all genres, countries, and cultures in one place. This vast access to music in the palm of your hand lets you listen to music however you want, whenever you want, and with whoever you want. Curating your playlist empowers you to tell a story through music, and it can shape experiences that you go through whether it is played in the background or taking center stage. The process of making a playlist and sorting through millions of songs to try and find which set of 50 go best together is difficult and time-consuming; this is why there are numerous articles online that give people advice on how to craft the perfect playlist.

Soundsgood blog wrote an article called “How to Find the Perfect Songs for Your Playlist” which highlights how to make a playlist in six simple steps:

  1. Know your audience music habits
  2. Check existing playlists featured on streaming platforms & playlists websites
  3. Search for specific niche playlists with keywords
  4. Use smart radios/playlist automation
  5. Check the charts to see what’s on the top
  6. … And add whatever sounds good

This blog post is a good example of how the majority of articles think that a perfect playlist should be created. The reasoning is pretty simple and mostly relies on the subjective listening and personal preferences of the curator. These articles claim that by just adding whatever sounds good to a playlist will basically guarantee that it will be good. This method works for some people, but is a lot harder said than done.

After reading many articles that viewed the whole process of curating a playlist from a subjective lense, I wanted to see if I could examine the process from an objective point of view. The Spotify API is a great tool that would allow me to analyze characteristics of a song and see if I could locate any causation or correlation between playlists. The particular dataset that I found extracted songs from four different playlists — dinner, sleep, party, workout — and broke them down by 18 different features. For my analysis I wanted to focus on only five of the song features that had the most statistical significance between playlists — acousticness, instrumentalness, energy, loudness, and liveness. I wanted to utilize these features to try and find alternative ways to create a playlist that didn’t rely so much on subjective preferences, but rather focused more on the facts and data behind curating a great playlist. I wouldn’t expect the typical person to want to put forth the effort to use data to craft up a playlist but for those that are particularly data-savvy I think it would be an interesting way to do so.

Once I began an exploratory analysis on the Spotify API data, a lot of interesting findings came to the surface. By just looking at five of the song features I was able to find noticeable statistical differences between the playlists that could be used to place songs on the correct playlist. Examining the descriptive statistics of the breakdown of the various playlist genres could be used to compare the statistics for a specific song to determine which playlist would be the best fit.

To do my statistical analysis I combined all of the playlists together and found the average across all of the features. I then separated the data by playlist types and compared that to the overall averages to locate significant statistical differences and whether those occurred above or below the overall mean. As seen in the table below, by doing this simple analysis I was able to find a definitive pattern that would help me sort songs into their appropriate playlist.

These differences are also apparent when you visualize the descriptive statistics. Although there are some playlists with similar numbers, these visuals would still be helpful to reduce your playlist options.

To see how my findings compared to the subjective articles I categorized five random songs from each playlist to see if they were placed in the appropriate place. This method worked well for sleep and party songs but it wasn’t as successful when categorizing party and workout music.

I understand that using statistical analysis to examine the features of a song to curate a playlist isn’t the most practical process, but I think that for those who want to experiment with playlist creation it would be a fun idea. As seen in my simple analysis, there are significant differences between song features and these can be exploited to make an awesome playlist. Overall, making a playlist is a difficult task with no concrete ways to do it correctly so. Ultimately it is up to personal preference to decide whatever works best for you and what way you believe yields the best results.

--

--