(via Unsplash)

Using Python to Create a Spotify Playlist of the Samples on an Album

Trade in for-loops and random samples for drum loops and soul samples

Eric Hochberger
7 min readMar 4, 2020

--

Introduction

As an avid hip-hop fan, I have always been extremely interested in sampling. Sampling in music, for the uninitiated, is the act of repurposing a piece of an existing song as part of a new song. A great example is Kanye West’s “My Way Home” off of his 2005 album Late Registration, which samples Gil Scott-Heron’s “Home Is Where The Hatred Is” from his 1971 album, Pieces of a Man.

Late Registration is filled with great samples from the aforementioned Heron track to Shirley Bassey’s classic “Diamonds Are Forever,” which served as a theme song to the Bond movie of the same name. Having listened through Late Registration countless times, I wanted to dive deeper into the music and explore the influences that West and co-producer Jon Brion rendered literal via sampling. I began by searching the songs on Genius, finding the names of the samples and manually adding them to a Spotify playlist. This, predictably, grew cumbersome after a while, so I wrote a script to automate the process. Below, I will demonstrate how the script works using Late Registration as an example, but if you want to jump straight into the code, you can find it on my GitHub.

Web Scraping

The first step was generating a list of the samples from Late Registration. As mentioned before, Genius has great crowdsourced data for the samples on each song, so it was just a matter of gathering the links for the pages of each song on the album, looping through them, and scraping the sample information from each:

Genius Track Info for “My Way Home”
Uses Python Modules BeautifulSoup4 and Requests

Or so I thought. Upon running the scraper with this strategy and seeing that the resulting dataframe contained 12 results (two of which were duplicates), I suspected the scraper was missing information so I did some manual inspection on the Genius song pages to figure out why.

Genius song page for “Late” by Kanye West

As it turns out, the sample information on some Genius song pages is found in the introductory production annotation and not in the song’s info box. This was a frustrating realization, but it challenged me to engineer a workaround.

Getting Sample Data From Annotations

Genius, like Spotify, has a free API, which allows users to pull information about specific annotations so once I had the annotation isolated, I just needed to extract the sample information. Since annotations are not standardized, it would not have been straightforward to parse the text in order to determine the name of the sample. However, I noticed that any time Genius users refer to samples, they provide an accompanying Youtube link. How considerate. Using this information, I was able to develop a supplementary scraper that isolates a Genius production annotation and outputs sample information by scraping the title of the Youtube video within the annotation:

Now equipped with this more robust scraper, we can take a look at the data output from the Genius URL for Late Registration:

Sample Data from Info Boxes (Screenshot by Author)
Sample Data from Annotations (Screenshot by Author)

I’ll refer to these two dataframes as sample_data and titles, respectively, going forward. Though it may appear that the scraper made some errors given that there are duplicate tracks in the dataframes, this is actually evidence that the scraper is working. The first two tracks on Late Registration, “Wake Up Mr. West,” and “Heard ’Em Say,” are brilliantly linked together by samples of this gorgeous Natalie Cole track with the former sampling the introductory piano part as is, and the latter employing a section of the same piano part on loop as the basis for its beat. “Diamonds Are Forever” features twice due to the album containing the original version of “Diamonds From Sierra Leone,” which samples the Bassey track and was the lead single from the album, and a remix that features Kanye’s mentor, Jay-Z.

Creating a Spotify Playlist with Spotipy

The next step was to automate the creation of an appropriately-named Spotify playlist, which is easily done with the Spotipymodule. Initiating Spotipy is a short process that boils down to registering with Spotify as a developer in order to obtain a client id and a client secret, which, in combination with an active Spotify username, allows you to perform myriad operations on your Spotify account through Python via Spotify’s free API. I use Spotipy’s “Authorization Code Flow,” to authorize usage of my account and then create a Spotipy object:

token = util.prompt_for_user_token(username,scope,client_id=client_id,client_secret=client_secret,redirect_uri='http://localhost/') 
sp = spotipy.Spotify(auth=token)

Now we can create the playlist:

playlist_name = f"Samples in {album_title} by {album_artist}"    sp.user_playlist_create(username, name=playlist_name)
Playlist on Spotify

Pretty straightforward. After executing this code, we have a descriptively-titled playlist on Spotify ready for us to add songs to.

Adding Tracks to the Playlist

Unfortunately, adding tracks to a playlist is not as easy as just submitting a list of tracks that you desire — but we don’t do data science because it is easy. We do it because it is hard, because the challenge of creating Spotify playlists of the music that inspired our favorite albums will serve to organize and measure the best of our energies and skills, because that challenge is one that we will accept, one we are unwilling to postpone, and one which we intend to win! But, I digress.

Here’s some code I wrote to get a list of Spotify track ids from the sample_data dataframe and the list of Youtube video titles. I utilize Spotipy and fuzzywuzzy, a fuzzy string matching module the purpose of which I will discuss in a bit:

We’ll break this code down in some more detail since I take some steps that are not intuitive. I discovered through experimentation that the first result from the Spotify search was not always the desired track due to idiosyncrasies in Spotify’s search algorithm that I won’t pretend to understand, and it was therefore necessary to retrieve multiple results:

results = sp.search(q=f"{sample_data['title'][i]} {sample_data['artist'][i]} ", limit=5, type='track')

I then had to verify which of the five responses was the track I wanted in the first place. Admittedly, my first attempt at doing this was a complete failure. I was young and optimistic and thought that the Spotify results’ titles would exactly match web-scraped titles from Genius, a crowdsourced website. They didn’t. Now a seasoned veteran, I do this by fuzzy (approximate) string matching both the artist and track title of the Spotify results to the artist and track title listed in the sample_data dataframe. Here’s a simplified version of the code:

if fuzz.partial_ratio(Spotify_artist_name, sample_data_artist_name) > 90 and fuzz.partial_ratio(Spotify_track_title, sample_data_track_title) > 90:                    
track_ids.append(Spotify_track_id) #append track id

Since it is nearly impossible to distinguish between the artist and track name from a Youtube video’s title, we do not have the luxury of verifying the Spotify results for the annotation workflow. However, this workflow yields many additional samples we would not have found otherwise, so I find the benefits outweigh the potential costs.

A Quick Aside On Fuzzy String Matching

Fuzzy string matching encompasses a group of methodologies that quantify how close two strings are to being the same. The way they do this is by counting the number of alterations (think replacing a letter, deleting a space) needed to make one string match the other exactly. I’ll illustrate why we need this technique with an example:

If you examine the sample_data dataframe above, you will see “Heavenly Dream” by “The Kay-Gees.” On Spotify, the name of this band is “The Kay Gees.” Tough. Clearly, if we did precise matching on artist name we would not end up with this, the correct sample, in our playlist. But, by using the fuzzywuzzy function partial_ratio(), which is an implementation of a fuzzy string matching algorithm, we get:

fuzz.partial_ratio("The Kay-Gees", "The Kay Gees")
100

Since some discrepancies are more dramatic than this, I have set the threshold for what I consider a match at 90 for both artist and track title strings, but feel free to experiment with your own!

Adding the Tracks to the Playlist

Fortunately, adding tracks to a playlist is as easy as just submitting a list of track ids for the tracks that you desire and that is precisely the output of the GetTrackIDs() function above. The only additional information we need is the playlist_id of our previously-created playlist:

And now we add the tracks to our playlist:

sp.user_playlist_add_tracks(username, playlist_id, track_ids)

Voilá! We have a pretty extensive playlist of songs sampled on Kanye West’s Late Registration. The only obvious error is an orchestral version of “Diamonds are Forever,” which, in my opinion, is a small price to pay in order to ensure we include The New York Community Choir’s 1977 “Since You Came in My Life,” which provides the horns central to the melody of the West classic, “Crack Music.”

I hope you enjoyed reading this walk-through half as much as I enjoyed making it, because I had a ball. I have more music-related projects on my GitHub.

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Responses (2)