Spotify Recommendations Using Community Detection

Published in

smucs

2 min readNov 17, 2021

Maddie Hulcy is a junior at SMU majoring in Computer Science. She is originally from Coppell, Texas.

Jay Kynerd is a junior at SMU from Birmingham, AL studying Computer Science and Philosophy.

The goal of our project extension was to use the Girvan-Newman algorithm to determine communities of music artists by analyzing the playlists in which they appear together.

We used data provided by Spotify for the Spotify Million Playlist Challenge (https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge), which included a name of a playlist, various other data points about the playlist, and a list of tracks in the playlist.

"playlists": [
        {
            "name": "Upbeat", 
            "num_holdouts": 11, 
            "pid": 1000002, 
            "num_tracks": 11, 
            "tracks": [], 
            "num_samples": 0
        },

Playlists were organized in a JSON organized as above, where

"tracks": []

is a list of the playlist’s tracks’ information:

{
                    "pos": 90, 
                    "artist_name": "Pink Floyd", 
                    "track_uri": "spotify:track:5HNCy40Ni5BZJFw1TKzRsC", 
                    "artist_uri": "spotify:artist:0k17h0D3J5VfsdmQ1iZtE9", 
                    "track_name": "Comfortably Numb", 
                    "album_uri": "spotify:album:5Dbax7G8SWrP9xyzkOvy2F", 
                    "duration_ms": 382296, 
                    "album_name": "The Wall"
                },

We used a subset of this dataset that included 2000 playlists of up to 100 tracks each. We stripped down each playlist to be only a list of artists appearing in the playlist and created a node for each artist. We then added edges to connect all of the artists within each playlist to each other with a weight of 1.

g.add_edge(artist1, artist2, weight=1)

If the same edge appeared multiple times (meaning the same two artists appeared in multiple playlists together), the edge weight was divided by 2 for each shared playlist, so that artists are more closely related whenever they appear together multiple times.

if g.has_edge(artist1, artist2):
    g[artist1][artist2]['weight'] /= 2

We then ran the Girvan-Newman algorithm on the constructed graph to determine communities of related artists. Communities were then sorted into lists and written to an output file for processing.

communities = girvan_newman(g)
node_groups = []
with open("communities_output.txt", 'w') as output_file:
    for com in next(communities):
        node_groups.append(list(com))
        output_file.write(list(com))
        output_file.write("\n")

Using these communities, we wrote a simple script that, given an artist name, returns a random related artist.

for genre in genres: 
    if artist in genre:
        random_artist = genre[random.randint(0, len(genre))-1]
        while artist == random_artist: # ensures unique artist
            random_artist = genre[random.randint(0, len(genre))-1]
        break

The user is prompted to enter an artist and, if the artist exists in the dataset, gets a related artist recommended.

*When Coldplay is input, Gotye is returned as a recommended related artist*

Going forward, we look to improve the accuracy of our recommendations by optimizing graph population. By using a different starting edge weight and/or reducing edge weights by a different factor for each repeated edge, we can construct a graph that has more clearly defined genre communities, thus providing better recommendations.

Spotify Recommendations Using Community Detection

Written by Jay Kynerd