Automating Finding Music Samples on Spotify with WhoSampled

Christopher Pease
5 min readNov 8, 2018

--

There are tons of ways to find new music on Spotify, which is just one reason why the Swedish company is leading the pack in music streaming user experience. The most exciting of these options use machine learning algorithms, implementing recommendation systems and collaborative filtering. I have found that they are missing some low hanging fruit: recommending songs based on samples used. This is part one of a series on achieving this goal, starting with locating all the samples within a playlist and delivering them to you instantly in the form of a playlist.

Crate-digging is a huge part of creating and appreciating original hip hop instrumentals

I love hip hop and electronic sampled music, and I always want to know where my favorite snippet of a song originated. I often find that the sample is well-worth listening to on its own, whether it comes from jazz, soul, classical, or some experimental sub-genre. Listening to samples is a way to better appreciate your favorite producers, and better understand their stylistic influences. It also allows you to appreciate the artist’s technique in distorting and chopping up audio. And finally, there are few feelings as satisfying as hearing a new popular song and recognizing where different parts of the instrumental have originated!

Of course, samples can be very hard to identify, so thank god for the awesome guys and gals over at WhoSampled.

WhoSampled is a fantastic website that allows users to post samples that they have identified on a song’s page. This allows you to simply look up a song on the site, and see the samples and when they occur in the song. There are a wide variety of sample types, including drums, vocals, and sound-effects. These are all stored and labeled on the site, making it very easy to get lost in all the connections between songs. The goal of this blog post will be to use WhoSampled to create a short program to automate the creation of a sample Spotify playlist. That is, our initial input is a Spotify playlist, and our program’s output will be a playlist containing all the samples from the original playlist. Let’s get started.

All of the code for this project is available at my github.

First, we are going to need to work with the Spotify API. We first need to register an app and sign up for an API key, which takes no time at all. The API is RESTful, easy to use, and also has a pretty sweet python library dedicated to accessing it. Spotipy makes authentication easy, and has tons of pre-made methods to get us off the ground quickly.

The call_api function is used for authentication, and uses the client and secret provided by Spotify to return a token. read_playlist uses the Spotify URI to identify the playlist (which can be found by clicking ‘‘Share’’ on a playlist). This also includes the username, and the function creates a dictionary containing track and artist names for each song. Now we will go to WhoSampled and search for the songs we need.

After doing a bit of poking around with the WhoSampled search bar, I figured the best way to locate the desired song would be through the track search.

While this query returns some samples, it does not return more than five. So we will use this to retrieve the link for the song page, and then make a second request in order to get all of the sample data available on WhoSampled. If you are unfamiliar with web-scraping in Python, BeautifulSoup is a must-use for any project, and here we are using it to locate the hyperlink <href> to the song. The second request we will make will locate list elements, <li> and store the contained samples for our later use. Again, this is all on github if you are unclear about my process.

Now that we have finished swimming in WhoSampled’s webpage HTML, we have the samples that we want to add to our new Spotify playlist. Finally, we are ready to look for this beautiful mix of jazz, oldies and obscure Romanian folk songs on Spotify (fingers crossed, you’d be surprised what wonderfully weird stuff they has available).

This function locates all the great samples we’ve found. To ensure we are getting the right song we search by track and verify that the artists match. Easy, right? We keep track of the success rate of locating samples, to see how we did. Now that we have the ids, we can add them all to a new playlist. The ‘sp’ which is passed into both our get_spotify_ids and call_api function is our spotipy object which is instantiated in our main function like so:

I have left out a few of the intermediate functions but the process is quite simple, and the resulting tool saves me time I would spend looking up each of the songs I like on WhoSampled before scouring Spotify for the samples. Track and sample uptake can definitely be improved for songs that are named slightly differently between WhoSampled and Spotify. Also I feel I might want to keep all samples in the same playlist, and would want to have a playlist description that keeps track of which samples belong to each song.

This was a fun way to spend a few hours, and got me really excited about working with the Spotify API. What I am really interested in is creating a recommendation system that finds new music for me based on similarity between samples. More on that to come soon!

--

--

Christopher Pease

Exploring the world through the lens of data science. Former physics researcher with a passion for machine learning and statistics.