Spotify Music Analysis

mpothen
INST414: Data Science Techniques
4 min readMay 13, 2022

Analysis of Music Groups on Spotify

Spotify is a platform which allows users to explore music and discover new artists as well. The main appeal of Spotify is its complex algorithm that recommends music to users. However, this algorithm only produces good results if a user is active. Users have control on what gets recommended to them based on the music they listen to. My goal in my analysis was to analyze the Spotify Music Analysis dataset in order to understand how Spotify might be grouping their music. Insights from this analysis may be able to help users understand what “music groups” they belong to and what artists to listen to if they would like to stay within their group or explore others.

The features I used to study groupings were danceability, and acousticness because these three features differ greatly for various music genres. I used KMeans clustering in order to form the groups. I also used the Elbow Method to justify the number of clusters I set for my analysis. Four clusters was recommended to be the threshold of clusters:

Elbow Method: Threhold of 4 clusters
The Orignial Scatterplot showing Danceability vs. Acousticness

To examine the clusters in order to understand the grouping, I first imported all of the CSVs into Google Sheets in order to run pivot tables and understand various facets of the data. Each of the groupings were based on danceability, liveliness, and accousticness. (see tables)

Clusters with Centroids

The first cluster pivot table was ordered by the highest count of artists. Some of the most common artists in this cluster were FIDLAR, WALK THE MOON, Demi Lovato, Big Time Rush, and Fall Out Boy. These artists all make pop music, specifically mainstream pop-punk rock. I followed similar procedures with the rest of the clusters in order to understand how they were grouped by genre/artist. Cluster 2 was populated by artists such as Gayngs, The Sonics, Radiohead, and Nathan Sykes. This group might have been put together because their genre, Classic American bands, have similar levels of liveliness and specifically acousticness. In Cluster 3, Drake, Disclosure, Rick Ross, and Future had some of the highest counts. This led me to believe that this cluster was based on high levels of danceability and liveness, and low levels of acousticness because of the high count of pop-rap artists contained in the cluster. Finally, cluster 4 had songs from mainstream pop artists, such as Ariana Grande, Demi Lovato, NYSNC, and Destiny’s Child. This grouping could have been based off of a high level of danceability and liveliness as well. But what may have differentiated it from cluster 3, the pop-rap cluster, was the range of liveliness. Pop-rap songs have more beats and “hype” compared to mainstream pop songs which may have a mid-level of liveliness. It is interesting to see how clustering algorithms can allow us to make the information available to us through the web more useful. Through clustering, we can divide up song data by select features and inform users of music platforms of which songs belong to certain clusters of their perceived interests.

In order to conduct my analysis, I used the sklearn library in python. I first created a scatterplot to see the relationship between danceability and acousticness. Then I used the KMeans tool from sklearn.clusters to conduct my kMeans clustering analysis. In order to create my predictions, I used the the function “.fit_predict()” to fit and predict my dataframe containing the features. I then calculated the centroids and colored the graph of clusters using the “.cluster_centers_” function. I encountered little to no bugs in my data analysis. In order to properly format my data, I had to extract the exact features I wanted to focus on in order to prepare my dataset for clustering. A limitation of my analysis was handling large amounts of numerical data. Since my dataset was very large (2017 rows), it was difficult to understand how accurate the groupings were. However, this could be solved using dimensionality reduction.

Conclusion

The main takeaway from my analysis is that grouping songs by danceability and acousticness resulted in clusters of similar genres. This may indicate that Spotify’s algorithm could put heavy emphasis on these two features when determining what songs to recommend to a user.

--

--