How to Find New Songs on Spotify Using Machine Learning
Intro
“Music gives a soul to the universe, wings to the mind, flight to the imagination, and life to everything.” — Plato
From jamming out to New Edition’s classic Candy Girl in the shower to focusing on homework with the help of Lo-fi beats, music plays a huge role in my life. I have playlists consisting of various genres for every occasion: long car drives, basketball workouts, celebrations, chores, relaxation, and more. As a daily user of the popular music listening application Spotify, I’ve experienced the song recommendations it gives to users. Every day, Spotify curates several playlists tailored to each of my music tastes. Moreover, there’s a playlist specifically for finding new music titled “Discover Weekly.”
Recently, I felt like Spotify’s recommendations weren’t hitting the spot. So, I took matters into my own hands and started this project: developing a machine learning algorithm to recommend songs and add them to a playlist in my Spotify account.
In this article, I will walk you through each step of the process to achieve my goal:
- Creating a data pipeline to construct a scalable dataset of nearly 10,000 songs to use as song recommendations by implementing Spotify’s API to gather the first 50 songs of the official Spotify accounts’ 1300+ playlists
- Scraping the user’s top songs to base the recommender on using Spotify’s API
- Exploring user streaming history to understand music preferences
- Developing a classification algorithm with 99.5% accuracy to use to recommend songs
- Saving recommended songs from the algorithm to a Spotify playlist within the user’s account
If you’d like to jump straight into the code, check it out on my GitHub.
Table of Contents
· Intro
· Table of Contents
· Data Engineering & ETL
· Data Exploration
· Model Development
· Usage
· Conclusion & Next Steps
· Contact
Data Engineering & ETL
To recommend songs to the user, it’s necessary to have songs ready to recommend. In order to do this, I used Spotify’s API to build a playlist of nearly 10,000 songs from the official Spotify accounts’ playlists. This playlist could have been much larger; I chose to stick with 10,000 songs because I felt it would be adequate for my purposes. If you’d like to recreate this playlist, take a look at the data engineering notebook on my GitHub.
I decided to use official playlists from Spotify for several reasons.
- The number of genres and music types. For this algorithm to work, it has to give recommendations in every genre. Since Spotify has almost 1400 playlists of varying length, it is the optimal account to choose from.
- These playlists are regularly maintained. Each playlist is constantly being updated with new songs, removing old and less popular songs, and is tailored to the users that follow the playlist. This allows my algorithm to recommend songs that are popular right now and well into the future.
- Spotify’s playlists are already well-known and followed. For example, the playlist Rap Caviar has 13.6 million followers. These playlists aren’t popular for no reason — each playlist has quality songs people enjoy that I can use as recommendations.
I made a function to loop through each of Spotify’s playlists and grab the playlist ID:
# Getting playlist IDs from each of Spotify's playlists
playlists = sp.user_playlists('spotify')
spotify_playlist_ids = []
while playlists:
for i, playlist in enumerate(playlists['items']):
spotify_playlist_ids.append(playlist['uri'][-22:])
if playlists['next']:
playlists = sp.next(playlists)
else:
playlists = None
print(spotify_playlist_ids[:20])
From there, I got the first fifty track IDs from each playlist by using the playlist ID:
# Creating a function to get the first 50 tracks IDs from a playlist
def getTrackIDs(playlist_id):
playlist = sp.user_playlist('spotify', playlist_id)
for item in playlist['tracks']['items'][:50]:
track = item['track']
ids.append(track['id'])
return
Then got the track features for each song:
# Creating a function get features of each track from track id
def getTrackFeatures(track_id):
meta = sp.track(track_id)
features = sp.audio_features(track_id)
# meta
track_id = track_id
name = meta['name']
album = meta['album']['name']
artist = meta['album']['artists'][0]['name']
release_date = meta['album']['release_date']
length = meta['duration_ms']
popularity = meta['popularity']
# features
acousticness = features[0]['acousticness']
danceability = features[0]['danceability']
energy = features[0]['energy']
instrumentalness = features[0]['instrumentalness']
liveness = features[0]['liveness']
loudness = features[0]['loudness']
speechiness = features[0]['speechiness']
tempo = features[0]['tempo']
time_signature = features[0]['time_signature']
track = [track_id, name, album, artist, release_date, length, popularity, danceability, acousticness, energy, instrumentalness, liveness, loudness, speechiness, tempo, time_signature]
return track
I placed each song with its audio features into a data frame. This data frame holds 9,819 songs.
Now that the song recommendation data frame is done, it’s time to create the data frame filled with the user’s favorite tracks.
Using Spotify’s API, I acquired the user’s top 50 songs with audio features in recent months:
# Getting top 50 tracks from user
results = sp.current_user_top_tracks(limit=1000, offset=0,time_range='short_term')# Convert it to Dataframe
track_name = []
track_id = []
artist = []
album = []
duration = []
popularity = []
for i, items in enumerate(results['items']):
track_name.append(items['name'])
track_id.append(items['id'])
artist.append(items["artists"][0]["name"])
duration.append(items["duration_ms"])
album.append(items["album"]["name"])
popularity.append(items["popularity"])
# Create the final df
df_favourite = pd.DataFrame({ "track_name": track_name,
"album": album,
"track_id": track_id,
"artist": artist,
"duration": duration,
"popularity": popularity})
Once I had both datasets created, I dropped the columns that could lead to data leakage. These columns included the song name, song album, song artist, and release date. All of these features are too unique to the song and have the potential to be faults in the data.
Finally, I added a “favorite” column to each dataset. Every song in the user’s favorite song data set receives a value of 1 in this column, but every song in the playlist song dataset receives a value of 0. This creates an imbalance between the favorite and non-favorite classes, which I will take care of in the model creation section.
I also removed duplicate songs in the playlist song dataset and favorite song dataset so the algorithm doesn’t recommend songs the user already knows and loves. This ensures every song in both datasets only appears once and eliminates the potential for there to be a song in both the train and test set. By doing so, the number of songs to use for recommendations dropped from 9,819 to 8,883.
Now the data is ready to be passed through a model!
Data Exploration & Visualization
While not required to receive song recommendations, learning more about my listening history was one of the highlights of this project.
To be clear, the data used in the data exploration & visualization notebook is not the same data used to create the recommendation algorithm. The data used in the exploration notebook is my streaming history from the past two years that Spotify sent me. If you’d like to explore your streaming history using the notebook in this project, check out my GitHub repo and follow the instructions in the ReadMe to acquire your streaming history.
Here’s some of the visualizations I created using Matplotlib and Seaborn:
The figures above make sense — when I study, I only listen to either Lofi beats or a playlist of songs only by the artist LUCKI.
I also explored the song features. Here’s a description of each:
acousticness — float — A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
danceability — float — Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
energy — float — Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.
instrumentalness — float — Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content.
liveness — float — Detects the presence of an audience in the recording.
loudness — float — The overall loudness of a track in decibels (dB).
speechiness — float — Speechiness detects the presence of spoken words in a track.
valence — float — A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track.
tempo — float — The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration
mode — int — Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.
key — int — The estimated overall key of the track.
Here is how they correlate together:
If you’d like to see more of my data analysis, see my data exploration notebook on GitHub.
Model Development
If you’d like, follow along in the notebook on my GitHub.
I combined the two datasets created in the data engineering section so I can create a train and test set for the model. I mentioned earlier the target column, ‘favorite’, was imbalanced. The ratio of favorites to non-favorites was 50:9769. To mitigate this issue, I used SMOTE to oversample the ‘favorite’ class. In order to do this, I decided to create my own train/test splits instead of using Sklearn’s train_test_split module:
# Shuffle your dataset
shuffle_df = df.sample(frac=1)
# Define a size for train set
train_size = int(0.8 * len(df))
# Split dataset
train_set = shuffle_df[:train_size]
test_set = shuffle_df[train_size:]# Create X and y from train set
X = train_set.drop(columns=['favorite', 'track_id'])
y = train_set.favorite# Create X_train and y_train using oversampler
oversample = SMOTE()
X_train, y_train = oversample.fit_resample(X, y)
After oversampling, the ratio of favorites to non-favorites was 1:1.
If I didn’t oversample, my model would’ve predicted most songs as the majority class in this case, 0, or ‘non-favorite’. Oversampling balances out the training data so the model will be able to train to accurately distinguish between favorite and non-favorite songs.
I didn’t oversample the non-favorite class because I would lose lots of data due to the fact the majority class overwhelms the minority.
After balancing the training data, I split the test data.
# Creating X_test and y_test
X_test = test_set.drop(columns=['favorite', 'track_id'])
y_test = test_set['favorite']
Now that the data has been preprocessed, it’s time to test out some machine learning models.
At this point, you might be wondering, “Logan, why did you choose to use a classification algorithm instead of a traditional recommender approach?”
My model selection ideology corresponds with the following quote from Albert Einstein:
“Everything should be made as simple as possible, but not simpler.”
From my experience listening to music, I either like the song or the song or I don’t, thus making song recommendations binary. I felt as though a classification algorithm would be best to handle this task rather than a recommendation system that filters out music. Moreover, a traditional recommendation system requires lots of data to train, which I am limited by due to restrictions within Spotify’s API. If I was to continue developing this project, I would do more feature engineering based on the audio features I have available using another audio library like Librosa.
The three models I chose to experiment with are
- Logistic Regression. When trying to classify structured data like that of this project, logistic regressions will usually give quick, solid results. It’s the simplest model I will be testing, therefore, it will be a baseline.
- DecisionTreeClassifier. Using the decision tree algorithm, we start at the tree root and split the data on the feature that results in the largest information gain IG, or the reduction in uncertainty towards the final decision. I believe this model will perform well on the gathered data because of its decision making nature and the scaling/structure of the data set. For instance, if the user data shows songs with a danceability value over 0.8, the model will be able to exclude songs with a danceability less than that value.
- RandomForestClassifier. I chose this model because of its similarity to the DecisionTreeClassifier in addition to its random nature. Sometimes, spontaneity just does it better.
To evaluate the accuracy of each model, I cross-validated the training data in ten folds with respect to the f1-score. The F1 score takes into account false negatives and false positives when calculating the accuracy of a model. If we use the generic accuracy_score method, false predictions would be considered as correct predictions, which could make the model seem more accurate than it really is.
Logistic regression set the baseline score at 83% accuracy.
DecisionTreeClassifier outperformed the baseline with a score of 99.3% accuracy.
RandomForestClassifier marginally improved from DecisionTreeClassifier with a score of 99.7% accuracy.
These scores sparked some red flags. First, I checked if the classes were imbalanced, which they weren’t. Then, I added more folds to the cross-validation to ensure there was no overfitting taking place. Next, I examined the training data once again to ensure there were no columns that could have led to data leakage. Finally, I took a look at the confusion matrix for each model.
Logistic Regression’s Confusion Matrix (baseline):
Decision Tree Classifier’s Confusion Matrix:
RandomForestClassifier’s Confusion Matrix:
Although the RandomForestClassifier had the highest accuracy, the confusion matrix showed the DecisionTreeClassifier truly had the higher accuracy because it has less false positives and negatives. After getting nearly identical results on test data, I decided to use DecisionTreeClassifier as my recommendation algorithm.
Since the model has a very niche idea of what a favorite song is based on the favorites dataset, it is necessary to accept songs that might not be absolute favorites. Initially, I received a total of 6 song recommendations because the model was not very lenient in deciding which song is a favorite. I solved this issue by using a threshold and Sklearn’s proba function.
# Predicting if a song is a favorite
prob_preds = pipe.predict_proba(df.drop(['favorite','track_id'], axis=1))
threshold = 0.30 # define threshold here
preds = [1 if prob_preds[i][1]> threshold else 0 for i in range(len(prob_preds))]
df['prediction'] = preds
I’ve defined the threshold as 0.3 to get approximately 20 song recommendations from nearly 10,000 songs to choose from. You can play with this value if you’d like more or less songs in the recommendation playlist.
Usage
Now that the favorite songs have been predicted, they can be added to a playlist within the user’s Spotify account.
I wrote a function that creates a playlist on the account.
# Creating a function that builds a playlist in the user's spotify account
def create_playlist(sp, username, playlist_name, playlist_description):
playlists = sp.user_playlist_create(username, playlist_name, description = playlist_description)
Another function to add songs to the playlist just created.
# Getting the playlist ID of the most recently made playlist
playlist_id = fetch_playlists(sp,username)['id'][0]# Function to add selected songs to playlist
def enrich_playlist(sp, username, playlist_id, playlist_tracks):
index = 0
results = []
while index < len(playlist_tracks):
results += sp.user_playlist_add_tracks(username, playlist_id, tracks = playlist_tracks[index:index + 50])
index += 50# Adding songs to playlist
list_track = df.loc[df['prediction'] == 1]['track_id']
enrich_playlist(sp, username, playlist_id, list_track)
Conclusion & Next Steps
With that, we’ve developed a playlist of song recommendations (Fig. 12)! I listened to this playlist and let me say — the playlist was good. While not perfect, it captured songs in genres from Gospel to Rap to Lofi, matched my music taste, and introduced me to songs I’ve had on repeat for the past few days. If I want some new jams, I will be using this recommender.
If you’d like to try out the recommender in this article, check out my GitHub. Please let me know what you think of the songs it recommended!
As I reflect on this project, I realize how much deeper it can go. If I was to continue developing this project, here’s what I would do:
- Build a larger dataset of songs from official Spotify playlists utilizing the data pipeline to use in recommendations
- Create a web application to host the recommender. Initially I planned to do this; however, due to a bug in the Spotify API, I cannot.
- Add more features for each song
- Expand the capabilities of the algorithm to make recommendations based on a song, a playlist of the user’s choice, genre, keyword, etc…
All in all, this project was a blast to create.
Contact
Feel free to reach out to me on LinkedIn and follow my work on Github!