Photo by Fimpli on Unsplash

Finding Genre-Specific Music Recommendations for Your Favorite Song

An informed and methodical way to explore new styles of music based on a single track and machine learning algorithms.

Nanette Wu
7 min readAug 29, 2019

--

TL;DR: We made a web app that introduces listeners to new music genres, ranging from classical to jazz, given a song they already love.

“Classical music is boring.”

Sure, you might agree with that — especially if you didn’t dedicate your childhood to immersing yourself in the world of Mozart and Beethoven.

Hear me out though. Classical music is actually kinda cool.

Believe it or not, a lot of the dance-y, more “pleasurable” songs have its roots in the compositions of a bunch of dead dudes.

This might be hard to believe at first glance. But allow me to show you a common thread between the boppin’ bangers of today and grandpa’s bedtime music.

I’m Nanette, and I think music is pretty neat. Whether it’s today’s hits, tropical house, or classical concertos — you name it, I’m down to give it a listen.

Me and my clarinet, Clarinanette 🎵

I’m also a senior at MIT pursuing a dual degree in computer science & music. The fusion of music technology fascinates me tremendously — particularly the rapid growth of the music streaming industry (think Spotify, Pandora).

During one of my endless spurts of googling “cool music tech”, I stumbled upon Chartmetric, a music analytics startup in the Bay Area. With a desire to be home in Cupertino and work for a company that perfectly aligned with my interests, I pursued a summer internship with them. Luckily, they brought me onboard as a backend/data engineering intern!

The Chartmetric app. (Photo from Chartmetric’s homepage)

What surprised me the most about the product was that it has so. much. data. 13.6 million songs, 3.9 million albums, 1.7 million artists across Spotify, Apple Music, YouTube, SoundCloud, TikTok, and more 😱

Predictably, I was startled by the daunting amount of data analytics. How was I supposed to know where to start, what to do, and simply make sense of the mountains of numbers and statistics?

In the face of so much possibility, I thought back to my interests in streamed popular music and my background in classical music.

As I came to Chartmetric and played around with their data, I found commonalities in sonic features of pop and classical songs: tempo, loudness, energy level…wait.

Can this be used to “bridge” these different worlds of music? Can we somehow “identify” a song by its acoustical characteristics and compare them to songs in a different genre?

You bet.

Genrecommender: Genre(-Specific) Recommending

I teamed up with a web-hack-loving, Vim shortcut aficionado/co-intern Ethan Houston to build a genre-specific music recommendation website. The project is broken into three parts: the web app (Ethan’s focus), recommender, and data processing (both my focus), as shown below:

Visual representation of genre-specific song recommender system.

What does Genrecommender do?

Given any (your favorite!) song, we’ll send back 5 (out of ~750K possible) recommendations from a genre. You can take your pick from classical, country, jazz, R&B, or pop.

Current music recommenders find songs in a genre you’ve already listened to; given what you like, you’ll be kept within a comfortable box of stuff you’re accustomed to. What makes Genrecommender different, however, is how it helps you branch out to new types of music — which is difficult with unfamiliar genres.

What makes it so great?

Party in the front, business in the back. The simple interface understates a lot of the behind-the-scenes work.

All you need to do is input a song name (or Spotify track link) and select a target genre on the frontend. Within seconds, you have five brand new songs you can jam to directly on our website.

In the backend, however, an army of data manipulations, Scikit-Learn Pipelines, Dockerized containers, and Google Cloud Platform web service integrations work together to tackle your recommendation. Sounds like a lot, but the entire recommendation flow takes less than half a minute end-to-end.

What do I get to see?

The songs are displayed in a force graph, so the nodes & edges are interactive. The center node represents your input song, and each recommendation node is linked with an edge: the closer the node, the better the match, and the bigger the node, the more popular it is:

Sample recommendations for Billie Eilish’s “bad guy”.

By using Chartmetric data, we also include hard-to-find analytics for each recommendation, including:

  • Spotify Playlist Count: total number of playlists the track is on
  • Spotify Reach: total followers those playlists have
  • Popularity: Spotify’s algorithmically calculated popularity

But wait, there’s more! There’s a couple more links to give you even more info about each match: 1) the Chartmetric page for the recommendation if you’re a numbers kind of person, or 2) the top Spotify playlist the song is on.

If you’re interested in the technical development of the project, read on. Otherwise, you can play around with the app here!

Development Process

Nah, it’s a bit more than that.🤓

Step 0) What I Need (to Know): Background

  • Context: We had 10 weeks to create, design, and implement an original project using Chartmetric data.
  • Programming Languages: Python, SQL (Data Science); JavaScript (Web)
  • Key Concepts: ingesting data, SQL queries, cleaning/filtering data, APIs, shortest distance algorithm, training pipelines, Flask, React, web apps

Step 1) Data, Data, and More Data: Data Ingestion

To recommend songs, we need to collect songs to work with.

By querying Chartmetric’s database, we gathered genre-specific data for ~750K songs: a song’s genre (from iTunes), artist (from Spotify), and acoustical features (from Spotify/Echo Nest).

Step 1.5) Gotta Clean it Up: Data Processing

Not all data comes perfect. Some songs we collected didn’t exist any more. Others were missing certain acoustical features. Regardless, we don’t wanna deal with bad data.

Clean-up duty was done in a Jupyter notebook, which simplified the filtering process with data visualizations. Each pool of potential matches in a genre was stored in a pandas dataframe, which became my best friend for reformatting, organizing, and prepping the data for further manipulation.

Step 2) Machine Learning Central: Shortest Distance Algorithm

With data on hand, the next step was to determine how to compare the songs.

To do so, we had to decide what features to use to vectorize each song. Though this seemed straightforward, picking the features was a particularly agonizing process: having too few properties produced subpar recommendations, while too many features resulted in overfitting to a particular set of songs in one genre (i.e., there were always the same five classical songs that was most “pop”).

Ultimately, we chose four Echo Nest features: danceability (beat consistency), energy (excitement), tempo (speed), and valence (happy vs. sad). These seemed to generally cover a song’s “dimensionalities” across the five genres we tested.

To make the comparison from song to song, we checked for how close the two songs were with shortest distance algorithms. After experimenting with Jaccard, Minkowski, and Euclidean distances, we decided that Euclidean was our best bet, which performed equally as well in trial assessments if not better, to prioritize simplicity.

After the songs were vectorized, we used Scikit-Learn (sklearn) pipelines to perform recommendations. After the training process, the pipeline would fit a user’s input to our model and output recommendations sorted by how “close” in distance they were to the input.

Step 3) Bridging the Backend and Frontend: Making an API

Now that the data science backend was done, we needed a way to trigger our algorithm and interact with the data. We chose built an API using Flask & Gunicorn to allow the website to make HTTP calls to our endpoints.

To keep recommendations lightweight, we used the Pickle library to package up Python objects and preserve them across requests. This allowed us to constantly reuse our genre-specific dataframes and trained pipeline.

Step 4) Take it to the Web: Application Development

This was all Ethan’s hard work. We used React to build our application and hosted it on GCP, so Genrecommender could float in its own serverless and containerized world, allowing simple and flexible resource management.

With a little help from Cloud Build and Docker, we deployed our custom production environment to Cloud Run, which can be found at genrecommend.me! 😊

Moving forward, the Chartmetric team plans on continuing to use this tool as a fundamental way to compare song similarity. There are some exciting projects in the works that build on acoustic feature analysis to include social media info, song metadata, brands, and more!

Genrecommender is just one of many possible applications of music analytics. Music streaming is a booming industry with an ever-growing amount of data, which opens the doors to so many options of what to do with and what kinds of music you’ll stumble across.

And maybe you’ll find that classical music isn’t so boring after all.

Huge kudos to Dr. Josh Hayes for instilling confidence in my work and his incredible mentorship, Tomi Kalmi for overhauling the frontend, and Komala Prabhu for her support in advising the backend development.

Super thankful that I got to spend my summer with the amazing Chartmetric team, check ’em out here!

If you have any questions about the project, feel free to reach us at nanette@mit.edu / ethan.houston@utexas.edu / hi@chartmetric.com.

--

--