How I use AI/ML to curate my music on Spotify

Alankar Naik
Analytics Vidhya
Published in
5 min readJul 11, 2020

--

“Without music, life would be a mistake.” — Friedrich Nietzsche

Every Monday morning I wait for my 30 new songs, which are recommended by Spotify’s awesome algorithm. The reason why Spotify is able to recommend you music better than anyone else is because they use machine learning to understand your taste and recommend you songs which you are most likely to listen.

Since music has always been my passion and someday I aspire to be a DJ/Music producer. I decided to create something similar using the data from this URL for my playlists and create a model that would identify songs based on my taste and rate them. This would allow me to make my playlist even better which recently got featured in an online music curation website.

Does AI/ML really work in music curation?

Before I even began to create a model that would work for me, I had to prove to myself that I could indeed use machine learning to programmatically to create or suggest songs for my playlist. To test this hypothesis, I first decided to create a genre classification model to validate if I am able to predict the genre of the song using the data collected below for each song.

Beats Per Minute (BPM) - The tempo of the song
Energy - The higher the value, the more energetic song
Danceability - The higher the value, the easier it is to dance to song
Loudness - The higher the value, the louder the song.
Valence - The higher the value, the more positive mood for song
Length - The duration of the song
Acoustic - The higher the value the more acoustic the song is
Popularity - The higher the value the more popular the song is

Using KNIME, I tested out the following models to create a genre classification engine:

As you can see I tried multiple algorithms like:

  1. Naive Bayes Classifier
  2. Gradient Boosted Trees
  3. Decision Trees
  4. Random Forest

Gradient Boosted Trees worked the best for me with an accuracy of ~70%.

Below is the result of the confusion matrix, which shows decent results with some overlap between genres because of their similarities in attributes.

I have attached the datasets and KNIME workflows at the end of the article for users to play with the data and explore it further.

How to create a model which identifies similarities in my playlists and suggest me similar songs?

What model to choose?

After speaking with some ML/AI engineers from the industry, I decided to use Cosine Similarity to build a model which would help me identify songs based on my personal taste. Cosine similarity measures the similarity between two vectors of an inner product space. It is measured by the cosine of the angle between two vectors and determines whether two vectors are pointing in roughly the same direction.

How does cosine similarity help us to find similar music?

I have categorized my playlist into three like categories :

  • Average
  • Great
  • Super Like

The reason why I do this is to find set of songs closest to my like level and based on that approach I could cherry pick them to add them to my playlist. Using the similarity search node, I can use the cosine similarity approach to find the closest 3 neighbors to my individual songs from the target playlist songs. This allows me to listen and evaluate the songs selectively before adding them to my main playlist.

The output of this similarity search has the following 
important columns which helps us to select songs :
- Similarity : (1 - Most similar and 0 as most dissimilar)
- Most Similar Song Index - 0,1,2
- Most Similar Track Title
- Like Level : Like level from the original playlist data

This is the configuration I used in my similarity search node to find the 3 most similar songs to every individual song in my playlist.

As you can see I include the columns highlighted in green to compute by cosine distances to my input songs and find the closest 3 neighbors.

I also filter using range i.e. adding a threshold for similarity coefficient to filter out any values lower than 0.70.

My output results in 3 similar songs if they meet the threshold requirement between 0.70 and 1.0.

This is the workflow image of the similarity search with an output for the each like level.

Here is a sample output for Super Like category:

For every song on in my playlist I now have at least 3 similar songs which I quickly listen and add to my playlist.

Important Links

All Data Set, KNIME Workflow

Conclusion

This model is a starting point, as there is more opportunity to grow the complexity here, but it’s my humble effort to create and amalgamate my two biggest interests in life. This shows us the power of using data effectively and intelligently to optimize even the simplest of task like creating a playlist.

--

--