Network of Genre and Artists in Spotify

Saleena John
Web Mining [IS688, Spring 2021]
6 min readMay 9, 2021
Image courtesy: NYC Data Science Academy

Spotify is one of today’s leading commercial music streaming services providing music content of two-tiered service to its users. This app is essentially useful in streaming music that provides entertainment to a wide range of audience with various preferences. This application is the best free online radio on the market, which provides an easy user interface. Some features of this application include browsing, and searching for the album, artist, genre, record label, and adding the record to the playlist, ensures that song playing is prioritized at all times. Spotify users can subscribe to a premium version or can simply enjoy a free version of this application. The free version of this application supports features like online listening with compromised audio quality with an interruption from multiple back to back advertisements. The premium version of this application provides offline streaming with advertisement-free music and with high audio quality songs. This application has over 155 million premium subscribers and offers over 70 million songs from various genres of music and albums. Spotify, unlike its competitive music streaming services, have something to offer for all of its users. Some information from their records include data from music industry professionals, artists, and consumers, in order to identify user’s needs and to improve these as positive developments of their software.

Spotify is successful for several reasons. First, the great user experience it provides. Using Spotify is really simple, as the application design is centered around playlists. Users can add and play a song inside a playlist. Next, Spotify can be integrated with all devices, meaning that music that is played on a phone can be easily switched to playing on a computer without missing the music. Last, but not least, its simple monthly premium prices attract many customers which provides full control over the music, which can be shared with family and friends.

In this project, I am analyzing how artists and genre are related by using network graphs. I am collecting the data from Spotify WebAPI using the spotipy library.

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd
client_id = '<client_id>'
client_secret = '<client_secret>'
client_credentials_manager = SpotifyClientCredentials(client_id, client_secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)
def getTrackIDs(user, playlist_id):
ids = []
playlist = sp.user_playlist(user, playlist_id)
for item in playlist['tracks']['items']:
track = item['track']
ids.append(track['id'])
return ids
ids = getTrackIDs('c8ztbhj7n5pt845269o943od1', '<endpoint>')
data.head()

Out of this we need only the artists and genre data.

The null values are removed from the data set and for simplicity in graph I chose the top 100 artists.

Plotting the network graph

Networkx is a python package for creation, manipulation and study of the structure, dynamics and functions of complex networks. Network graphs basically consist of nodes and edges. The vertex in the graph is called as nodes (usually the variables that are used for analysis) and the connections are called edges.

The dependencies used for performing network analysis are

# Dependencies
import numpy as np
import pandas as pd
import seaborn as sns
from subprocess import check_output
import matplotlib.pyplot as plt
import networkx as nx

The graph was plotted using matplotlib package.

g = nx.Graph()
g = nx.from_pandas_edgelist(data,source='artist',target='genre')
print(nx.info(g))
plt.figure(figsize=(20, 30))
pos=nx.spring_layout(g, k=0.15)
nx.draw_networkx(g,pos,node_size=25, node_color='blue')
plt.show()

The edges in the graph are

From the graph we can understand that pop has the largest cluster. So finding the shortest path yields

{'pop': 0,
'Backstreet Boys': 1,
'Britney Spears': 1,
'Shaggy': 1,
'Mannheim Steamroller': 1,
'Avril Lavigne': 1,
'Jennifer Lopez': 1,
'Whitney Houston': 1,
'Josh Groban': 1,
'Lindsay Lohan': 1,
'Shania Twain': 1,
'Mariah Carey': 1,
'Kelly Clarkson': 1,
'Taylor Hicks': 1,
'Carrie Underwood': 1,
'Taylor Swift': 1,
'Susan Boyle': 1,
'Andrea Bocelli': 1,
'Lady Gaga': 1,
'Justin Bieber': 1,
'Michael Bublé': 1,
'Rihanna': 1,
'Adele': 1,
'Bruno Mars': 1,
'One Direction': 1,
'The Robertsons': 1,
'Sam Smith': 1,
'Ed Sheeran': 1,
'Troye Sivan': 1,
'Harry Styles': 1,
'Billie Eilish': 1,
'country': 2,
'jazz': 2,
'Tim McGraw': 3,
'The Chicks': 3,
'Toby Keith': 3,
'Kenny Chesney': 3,
'Lady A': 3,
'Scotty McCreery': 3,
'Blake Shelton': 3,
'Luke Bryan': 3,
'Chris Stapleton': 3}

Betweenness Centrality

Betweenness centrality measures the extent to which a vertex lies on paths between other vertices. Vertices with high betweenness may have considerable influence within a network by virtue of their control over information passing between others.

{'The Beatles': 0.0,
'rock': 0.025688073394495414,
'Backstreet Boys': 0.0,
'pop': 0.12568807339449542,
'Creed': 0.01601334445371143,
'Britney Spears': 0.0,
'Tim McGraw': 0.0,
'country': 0.05412844036697248,
'Shaggy': 0.0,
'Limp Bizkit': 0.0,
'metal': 0.01818181818181818,
'Mannheim Steamroller': 0.0,
'Nickelback': 0.006005004170141785,
'Enya': 0.0,
'alternate': 0.00316930775646372,
'No Doubt': 0.0,
'Linkin Park': 0.0,
'Usher': 0.0,
'r&b': 0.007506255212677232,
'Shania Twain': 0.025020850708924104,
'The Chicks': 0.0,
'Eminem': 0.0,
'hip-hop': 0.0725604670558799,
'Avril Lavigne': 0.0,
'Jennifer Lopez': 0.0,
'Whitney Houston': 0.0,
'B2K': 0.0,
'Ruben Studdard': 0.0,
'Alicia Keys': 0.0,
'Toby Keith': 0.0,
'Josh Groban': 0.0,
'Outkast': 0.0,
'Rod Stewart': 0.0,
'Ludacris': 0.0,
'U2': 0.0,
'Lindsay Lohan': 0.0,
"Destiny's Child": 0.0,
'Carrie Underwood': 0.025020850708924104,
'Kenny Chesney': 0.0,
'Mariah Carey': 0.0,
'Kelly Clarkson': 0.0,
'Black Eyed Peas': 0.0,
'Jeezy': 0.0,
'Taylor Hicks': 0.0,
'Daughtry': 0.0,
'Eagles': 0.0,
'Taylor Swift': 0.0,
'Bow Wow': 0.0,
'Chris Brown': 0.0,
'Beyoncé': 0.0,
'Kanye West': 0.0,
'AC/DC': 0.0,
'David Cook': 0.0,
'Musiq Soulchild': 0.0,
'Susan Boyle': 0.0,
'Andrea Bocelli': 0.0,
'Lady Gaga': 0.0,
'Justin Bieber': 0.0,
'Michael Bublé': 0.006672226855713094,
'T.I.': 0.0,
'Nicki Minaj': 0.0,
'Rihanna': 0.0,
'Daft Punk': 0.0,
'electro': 0.0,
'jazz': 0.0,
'Adele': 0.0,
'The Black Keys': 0.0,
'Lady A': 0.0,
'Drake': 0.0,
'Scotty McCreery': 0.0,
'Bruno Mars': 0.0,
'One Direction': 0.0,
'The Game': 0.0,
'Phillip Phillips': 0.0,
'Blake Shelton': 0.0,
'Beyonce': 0.0,
'R. Kelly': 0.0,
'The Robertsons': 0.0,
'Childish Gambino': 0.0,
'J. Cole': 0.0,
'Pentatonix': 0.0,
'a cappella': 0.0,
'K. Michelle': 0.0,
'Sam Smith': 0.0,
'Ed Sheeran': 0.0,
'Coldplay': 0.0,
'G-Eazy': 0.0,
'Rick Ross': 0.0,
'Troye Sivan': 0.0,
'The Weeknd': 0.0,
'The Rolling Stones': 0.0,
'Metallica': 0.0,
'Tech N9ne': 0.0,
'Twenty One Pilots': 0.0,
'Luke Bryan': 0.0,
'Quality Control': 0.0,
'Big Sean': 0.0,
'Chris Stapleton': 0.0,
'Kodak Black': 0.0,
'Meek Mill': 0.0,
'Travis Scott': 0.0,
'Harry Styles': 0.0,
'Roddy Ricch': 0.0,
'Post Malone': 0.0,
'Billie Eilish': 0.0,
'Trippie Redd': 0.0,
'Kid Cudi': 0.0,
'Jack Harlow': 0.0,
'Pop Smoke': 0.0,
'Bad Bunny': 0.0,
'latin': 0.0}

Degree centrality

Degree centrality is defined as the number of links incident upon a node (i.e., the number of ties that a node has). If the network is directed (meaning that ties have direction), then two separate measures of degree centrality are defined, namely, indegree and outdegree.

References

https://www.sci.unich.it/~francesc/teaching/network/betweeness.html#:~:text=Betweenness%20centrality%20measures%20the%20extent,over%20information%20passing%20between%20others.

https://link.springer.com/referenceworkentry/10.1007%2F978-1-4419-9863-7_935#:~:text=Degree%20centrality%20is%20defined%20as,%2C%20namely%2C%20indegree%20and%20outdegree.

--

--