Spotify Recommendations Using Community Detection

Jay Kynerd
smucs
Published in
2 min readNov 17, 2021

Maddie Hulcy is a junior at SMU majoring in Computer Science. She is originally from Coppell, Texas.

Jay Kynerd is a junior at SMU from Birmingham, AL studying Computer Science and Philosophy.

The goal of our project extension was to use the Girvan-Newman algorithm to determine communities of music artists by analyzing the playlists in which they appear together.

We used data provided by Spotify for the Spotify Million Playlist Challenge (https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge), which included a name of a playlist, various other data points about the playlist, and a list of tracks in the playlist.

"playlists": [
{
"name": "Upbeat",
"num_holdouts": 11,
"pid": 1000002,
"num_tracks": 11,
"tracks": [],
"num_samples": 0
},

Playlists were organized in a JSON organized as above, where

"tracks": []

is a list of the playlist’s tracks’ information:

{
"pos": 90,
"artist_name": "Pink Floyd",
"track_uri": "spotify:track:5HNCy40Ni5BZJFw1TKzRsC",
"artist_uri": "spotify:artist:0k17h0D3J5VfsdmQ1iZtE9",
"track_name": "Comfortably Numb",
"album_uri": "spotify:album:5Dbax7G8SWrP9xyzkOvy2F",
"duration_ms": 382296,
"album_name": "The Wall"
},

We used a subset of this dataset that included 2000 playlists of up to 100 tracks each. We stripped down each playlist to be only a list of artists appearing in the playlist and created a node for each artist. We then added edges to connect all of the artists within each playlist to each other with a weight of 1.

g.add_edge(artist1, artist2, weight=1)

If the same edge appeared multiple times (meaning the same two artists appeared in multiple playlists together), the edge weight was divided by 2 for each shared playlist, so that artists are more closely related whenever they appear together multiple times.

if g.has_edge(artist1, artist2):
g[artist1][artist2]['weight'] /= 2

We then ran the Girvan-Newman algorithm on the constructed graph to determine communities of related artists. Communities were then sorted into lists and written to an output file for processing.

communities = girvan_newman(g)
node_groups = []
with open("communities_output.txt", 'w') as output_file:
for com in next(communities):
node_groups.append(list(com))
output_file.write(list(com))
output_file.write("\n")

Using these communities, we wrote a simple script that, given an artist name, returns a random related artist.

for genre in genres: 
if artist in genre:
random_artist = genre[random.randint(0, len(genre))-1]
while artist == random_artist: # ensures unique artist
random_artist = genre[random.randint(0, len(genre))-1]
break

The user is prompted to enter an artist and, if the artist exists in the dataset, gets a related artist recommended.

When Coldplay is input, Gotye is returned as a recommended related artist

Going forward, we look to improve the accuracy of our recommendations by optimizing graph population. By using a different starting edge weight and/or reducing edge weights by a different factor for each repeated edge, we can construct a graph that has more clearly defined genre communities, thus providing better recommendations.

--

--