Spotify — Genres Network Analysis

Kfir Gisman
3 min readDec 18, 2021

--

Spotify is the largest music streaming service in the world serving more than 381 million users in over 180+ countries around the world. with millions of active users, and a huge variety of music genres, Spotify creates a great database that I was interested to explore.

I was curious to see if by creating a genres network I could find a correlation between the number of artists playing a genre to its betweenness centrality in the network.
To do so, I gathered the data using Spotify API and “Spotipy” Python library. I used the ORA network software to analyze and visualize the data.

Collecting Data

The Spotify genres couldn’t all be pulled at once. Therefore, I collected all 1256 playlists from the Spotify official account. The reason I chose to use the official Spotify account playlists is the lack of influence by a particular genre, a particular era, or a certain country.

For every song in the playlists, I took its artist, then for every artist I used the ‘related-artist’ function on Spotify API which gave me 20 more related artists to each one of the artists on my list. This step allowed me to expand my dataset. The next step was to determine how many artists shared the same genre duo. Through the Spotify API, I gathered information on each artist’s related genres. These 1937 genres were inputted into a matrix.

Network Analysis

Using the matrix I built, I created the genres network. My approach was to create a genre-based network in which every node consists of one musical genre. The edge weight is the number of artists who play both genres.

Left- Genres Network Graph, Bode Color by Louvain Algorithm. Right- Zoom In

I discovered that the number of artists related to a genre is distributed in a power-law distribution. This means that a few genres are played by a lot of artists, while many other genres are played by a fewer number of artists.

Among all genres, Rock, Dance-Pop, and Classic-Rock have the largest number of musicians who played those genres.

Dance-Pop, Pop, and Pop-Rap for example, had the larger betweenness rate (Normalized betweenness measures how often one node acts as a bridge between two other nodes).

Conclusion

In the linear line function, the X axis represents the degree betweenness of a genre. The Y axis represents the total degree of a genre.

The coefficient of determination is 0.4, which indicates a low positive correlation. It shows that when more artists play a genre, that genre will have more connections with other genres in the network.

Final Notes

This was my first attempt to understand and analyze a data network. As an active Spotify user, I enjoyed learning about Spotify’s genre network Since the platform gives a lot of data from many perspectives. I feel that I have only touched the tip of the iceberg.

Contact me : Gmail GitHub LinkedIn

--

--