Build Your Own Spotify Playlist of Best Playlist Recommendations!

Sümeyra Bedir
Deep Learning Turkey
7 min readJun 23, 2018

--

By default, Spotify recommends 5 tracks at a time for a playlist you create. This recommendation is primarily held by treating the playlist as a whole as well as checking unique artist /track similarities. You get 5 more track recommendations in each time you refresh.

image credits belong here

For more information​ on how Spotify recommends you tracks, you may watch this video here. Or you may check out the awesome slides here and here.

Currently, users get various types of recommendations from Spotify. Track based and artist based recommendations can be called from track/artist radio options. The playlist recommendations are more challenging in terms of depending on cross-relations between each artist and track in the playlist.

Now, I will tell you how I built my own way of creating a playlist of most relevant track recommendations for my playlists. This also helps you create similar playlists to those already exist.

Spotify for Developers

For this project, I used Spotify web API, Spotipy library, and some Machine Learning algorithms from Scikit-Learn libraries, all on Python 3. You may check the corresponding iPython notebook in my GitHub repository if you like.

So, here is an outline of what to do (- it is simpler than it may seem!)

You need to first log in and connect Spotify Developer to your Spotify account. On your dashboard, click ‘Create an App’. It will show up a screen like this;

As long as we are not interested in creating a commercial app for now, simply choose “I don’t know” when it asks you what you are building :) For the last step, you should agree the terms and conditions and then your app is ready to use. Click “Show Client Secret”.

Copy your Client ID and Client Secret, then click “Edit Settings”. Copy your address bar from your browser, paste it to the Redirect URIs, click “ADD” and save your settings.

Start Coding

  • Create a Jupyter notebook or whatever environment you like to run your Python scripts on and install Spotipy;
!pip install Spotipy

Import Spotipy library - you may want to check documentation here. Authorize a token at a scope containing at least “playlist-modify-public” access (You may add more to your scope. For available scopes see here).

The only information you need to provide in the code above is your ‘Client ID’, ‘Client Secret’ and ‘Redirect URI’ all retrieved from your app screen.

  • After successfully getting the Spotipy authorizations for using Spotify web API, you may start `playing` with your Spotify playlists.

Load your source playlist as a dataframe with audio features, setting track names as indices. You will get used to the way Spotify stores data in playlists after a plenty of experiments.

You can get your sourcePlaylistID from the last part of the public URL of your playlist. For this example I used my public Bluest of Blues playlist.

One of the important steps here in the above code is to make it ignore local tracks in your source playlist if any. It is possible to add local tracks to a playlist from your local machine but these tracks will not have track ids or acoustic features so we will not be able to use them here.

You will see that there are some features which we will not need to use. So pick only the following ones;

playlist_df = playlist_df[["id", "acousticness", "danceability", "duration_ms", "energy", "instrumentalness",  "key", "liveness", "loudness", "mode", "speechiness", "tempo", "valence"]]
  • Rate each track in your playlist with respect to its relevance to your Playlist so that our project becomes a multi-class classification task. Add this rating as a target column to your source playlist dataframe. For my example I used a rating from 1 to 10;
playlist_df['ratings']=[10, 9, 9, 10, 8, 6, 8, 4, 3, 5, 7, 5, 5, 8, 8, 7, 8, 8, 10, 8, 10, 8, 4, 4, 4, 10, 10, 9, 8, 8, 4]

While creating a playlist, you may feel that some of the songs are the ‘characteristic’ ones for your playlist, rate them highest. Some may stay there just for the sake of harmony so keep them with lower rates. This manually rating process may be modified upon your behavioral choices while creating a playlist. If you tend to add songs with respect to its relevance to your playlist (most relevant ones first and so on), then you may give automatic ratings depending on the order of the songs in your playlist.

  • From now on, your source playlist dataframe will be your training set.
X_train = playlist_df.drop(['id', 'ratings'], axis=1)
y_train = playlist_df['ratings']

Yes, this dataset is way too small to build a ML model on it! You will probably get better results if you have playlist with a larger size. But don’t worry, Spotipy recommendations function will take care of accurate recommendations. You are just trying to get the best ones according to your ratings using your better than worst model for your training set. So do not expect high cross-validation scores.

Some Machine Learning

Now, we will try some Machine Learning algorithms on our training set (Source Playlist Dataframe). I will not go into details for each algorithm here since it will be beyond the scope of this story, but for those of you who are familiar with these algorithms, it will be fun to see them work for your playlists ;)

I will first apply PCA (Principal Component Analysis) to reduce the dimension a bit. You may also prefer applying t-SNE at this step.

I ended up at 8 components that explained a 95% total variance. Fit your dataset to the PCA with optimal number of components in your case.

# Fit your dataset to the optimal pca
pca = decomposition.PCA(n_components=8)
X_pca = pca.fit_transform(X_scaled)

Now optionally, apply tf-idf vectorizer to the names of your tracks. Actually, this is optional because, Spotify will already do this for your tracks with more advanced techniques like Word2Vec and Annoy behind the scenes, when we call the Spotipy’s track recommendation function.

from sklearn.feature_extraction.text import TfidfVectorizer

v = TfidfVectorizer(sublinear_tf=True, ngram_range=(1, 6), max_features=10000)
X_names_sparse = v.fit_transform(track_names)
X_names_sparse.shape

Now, combine the two into one last training set;

from scipy.sparse import csr_matrix, hstack

X_train_last = csr_matrix(hstack([X_pca, X_names_sparse]))

Initialize a stratified 5-fold for cross-validation;

from sklearn.model_selection import StratifiedKFold, GridSearchCV

skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

Let us train a k-neighbors classifier first and keep the best parameters after a grid search over cross-validation;

# kNN firstfrom sklearn.neighbors import KNeighborsClassifier

knn_params = {'n_neighbors': range(1, 10)}
knn = KNeighborsClassifier(n_jobs=-1)

knn_grid = GridSearchCV(knn, knn_params, cv=skf, n_jobs=-1, verbose=True)
knn_grid.fit(X_train_last, y_train)
knn_grid.best_params_, knn_grid.best_score_

That gave .290 score -worse than we may expect :) We will try to improve this score a bit.

Next, let us try Random Forests;

# Random Forests second

parameters = {'max_features': [4, 7, 8, 10], 'min_samples_leaf': [1, 3, 5, 8], 'max_depth': [3, 5, 8]}
rfc = RandomForestClassifier(n_estimators=100, random_state=42,
n_jobs=-1, oob_score=True)
forest_grid = GridSearchCV(rfc, parameters, n_jobs=-1, cv=skf, verbose=1)
forest_grid.fit(X_train_last, y_train)
forest_grid.best_estimator_, forest_grid.best_score_

This gave .322 score.

Lastly, let us train a Decision Tree Classifier;

# Decision Trees thirdfrom sklearn.tree import DecisionTreeClassifier

tree = DecisionTreeClassifier()

tree_params = {'max_depth': range(1,11), 'max_features': range(4,19)}

tree_grid = GridSearchCV(tree, tree_params, cv=skf, n_jobs=-1, verbose=True)

tree_grid.fit(X_train_last, y_train)
tree_grid.best_estimator_, tree_grid.best_score_

This gave .452 score for the best estimator. Best of the worsts :)

Building the Test Set

If you did proceed this far, well done! So let us now enjoy Spotipy’s recommendations.

  • For the test set, we will first get a number of recommendations for each track in our training set by using Spotipy’s recommendations function. Set the number of recommendations limit to whatever you like. I preferred recommending (source playlist length)/2 tracks per track. So we have a dataframe of at most (source playlist length)²/2 tracks in our test set.

Keep the same features only;

rec_playlist_df = rec_playlist_df[["acousticness", "danceability", "duration_ms", "energy", "instrumentalness",  "key", "liveness", "loudness", "mode", "speechiness", "tempo", "valence"]]
  • Make predictions with your best model
# Make predictionstree_grid.best_estimator_.fit(X_train_last, y_train)
rec_playlist_df_scaled = StandardScaler().fit_transform(rec_playlist_df)
X_test_pca = pca.transform(rec_playlist_df_scaled)
X_test_names = v.transform(rec_track_names)
X_test_last = csr_matrix(hstack([X_test_pca, X_test_names]))
y_pred_class = tree_grid.best_estimator_.predict(X_test_last)
  • Get the tracks with the highest ratings to add them to a new playlist. (You may also wish to add them to the original source playlist and proceed from the first step infinitely many times :D)
rec_playlist_df['ratings']=y_pred_class
rec_playlist_df = rec_playlist_df.sort_values('ratings', ascending = False)
rec_playlist_df = rec_playlist_df.reset_index()

# Pick the top ranking tracks to add your new playlist; 9 or 10 will work
recs_to_add = rec_playlist_df[rec_playlist_df['ratings']>=9]['index'].values.tolist()

Create a new empty playlist to add these tracks;

# Create a new playlist for tracks to add
playlist_recs = sp.user_playlist_create(username,
name='PCA + tf-idf + DT - Recommended Songs for Playlist - {}'.format(sourcePlaylist['name']))

This will create a new playlist named ‘PCA + tf-idf + DT -Recommended Songs for Playlist -Bluest of Blues’ for my example. While adding songs to a playlist, the API lets you add 100 tracks at most at a single time, so you may need to reshape your recs_to_add array if it has more than 100 tracks.

# Add tracks to the new playlist
sp.user_playlist_add_tracks(username, playlist_recs['id'], recs_to_add);

And that’s it! You have a new playlist which includes best recommendations with respect to your song ratings for tracks in your source playlist.

So, this is my own way of getting more relevant recommendations for my playlist as a new playlist. Feel free to ask questions and comment.

As a note on building automatic playlists, there exists an app with a pretty user interface for creating smarter playlists, belonging to the author of Spotipy library. You may want to check that out here.

If you liked my story, you may clap as many times as you can or follow me on my GitHub repo -you will find more examples there :)

Here is my LinkedIn and Twitter accounts, if you like to contact!

Happy Spotipying!

--

--

Sümeyra Bedir
Deep Learning Turkey

PhD in Mathematics. Algebraic Coding Theorist. A mom of twins. Interested in #coding, #music, #datascience, #AI