Scrape, clean, sort, jam

Xristos Katsaros
Aug 15 · 5 min read
Photo by Fimpli on Unsplash

Literally anyone can make music these days on a budget of around zero dollars; all you need is a laptop and an imagination. Well, its a little more complicated than that, but producing music is way more accessible than it was 10–15 years ago. As a result, there seems to be an endless stream of new music being released all over the world. Its hard to navigate through the never-ending stream, and most people don’t care to bother exploring it at all, and will default to whatever is easiest to find. Typically, the easiest music to find is backed by huge budgets and/or record labels. Spotify has changed that over the past decade. They are definitely bring more obscure artists into the spotlight by featuring them on their own featured playlist, or an algorithm-curated-playlist or radio station based on the user’s listening history and saved songs/albums. However, I personally don’t get much out of those; they’re short, repetitive, and are based on what others have listened to as well. Just because I listen to the same artist(s) as someone else doesn’t mean we have similar tastes.

I got tired of this and wanted to make playlists based on the blogs I follow; they are the most consistent in recommending me new music that I actually like. That is why I started working on the curation station. When I first started working on it, the idea was simple: scrape a blog for new releases and make a playlist for that blog. Once I was able to do that, I thought “What if I want to make playlists based on genres as well?” I never really liked the idea of searching for anything by genre, because there’s never a movie or piece of music that falls under just one, and I don’t know what the hell “techno, funk, disco, experimental” is gonna sound like. So I tried using a Markov chain to predict the nearest and furthest “genre neighbors” to create a more defined playlists.

The first step is to scrape the blog and create a dictionary for each release.

Builds a dictionary from scraped data with the help of other functions

Next, I save the scraped info into a json file to use it in the Spotify script.

Now I can start writing the main script using Spotipy to help us use the Spotify API.

After loading our data from the json file I created earlier, I need to use it to get the genre information for each release. I will build a dictionary of genres where each genre is a key, its value is another dictionary of its neighboring genres and the frequency in which they are paired together.

Now that I have the genre dictionary, I can finely tune a genre of our choice by returning the most frequently paired genre and the least frequently paired genre. This way when can make our new dictionary of releases based on the genre criteria I have set up.

In order to create a playlist on your account, you have to get an authorization token from Spotify to let them know you are giving permission to modify your playlists. Spotipy makes this really easy with the Util module. After calling the module, your browser will open up to ask for permission (or login if you are not already) and you should copy and paste the authorization link into the message prompt.

Finally, I’m are ready to make the playlist creator!

This will be a function that takes in the dictionary of releases, the username, and the name of the new playlist. The first thing the function does is create the new playlist. Since I don’t know which of these releases are available on Spotify, I to used an exception in case it doesn’t exist. The function tries to find the album, and if it does not, it removes the album from the dictionary. If the album exists, it saves the results. The next thing the function does is check to make sure the album name on Spotify matches the album name in the dictionary, and if it does, it appends the album ID to a list of IDs. The function then runs a loop through the album ID list to retrieve the track ID of the first song, and finally adds all of the tracks to the new playlist. Spotify only allows 100 tracks per request, which is why I have it only adding 100. I tried to use a sleep the function to get around this, but it wasn’t working, and will require more time to figure out a way around this restriction.

I’d like to improve these scripts, specifically the functions that sort through the genres. It would be great to input a list of genres, and have the output return a more finely tuned genre arrangement. The main scripts I used are on GitHub to share with everyone to work on, so feel free to play around with the code yourself!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade