Building a Recommender System for Songwriters

Jon Johnson
4 min readOct 9, 2018

--

The Tallest Man on Earth by linspiration01 (Licensed under CC BY)

The world of digital music is filled with recommender systems built for the purpose of allowing users to discover new music based on their listening preferences. This has made it easy for fans of one artist to discover similar artists, and tracks of those artists that they might also like. However, I think this concept could be adapted to help listeners discover the often overlooked collaborators necessary to make a song, songwriters.

Given this, I’d like to build a recommender which could help listeners discover the songwriters behind their favorite hits, along with songwriters who write in a similar fashion, and works of theirs that might appeal to the listener.

Over the coming weeks, I’ll be publishing a series of pieces on building this model, inclusive of all the data collection, engineering, EDA, modeling, hair-pulling, and interpretation that happens along the way.

Of course, before all of that fun, we’ll need to begin with a problem statement that describes what I’m trying do:

Develop a model, which given a recording written by a particular songwriter or songwriters, can recommend similar sounding recordings written by different songwriters, and provide additional background information on those other songwriters to interested listeners.

For the remainder of this piece, I’ll give a high-level overview of several different topics pertaining to this project:

  • Why I’m building the model
  • Some proposed ML models to build the recommender
  • Gathering data to model with
  • Risks associated with a project of this caliber
  • Next steps in the process

Purpose of the Model

While performing artists may enjoy household name recognition, and earnings to boot, the same cannot be said for the songwriters who pen many of their hit songs. And, while fame may not be something that a songwriter looks for in their career, I believe this lack of recognition has also harmed their pocketbook. Songwriters and music publishers are often the last to negotiate with digital services to license their music, which leaves the majority of the revenue for artists and record labels to collect.

Proposed Models

I will first look to use a recommender system, perhaps implementing a neighborhood algorithm to recommend new song titles based on the initial song’s cosine similarity. I have not finalized this, however, and may look to use different models based on what I’m able to see during the EDA process.

Gathering Data

In order to build a recommender system, we’ll need some data! I’ve primarily pulled from two different API’s:

  • Spotify: For recording metadata, along with a number of audio features garnered through “Analyze”, an open-source tool for describing the underlying features of a piece of audio, originally developed by the Echo Nest.
  • Genius: For the oh-so important songwriter credits to link to the Spotify recording data.

I’ve narrowed the initial scope of this model to pertain to the top-10 most popular songs from every artist that has received an RIAA Gold, Platinum, or Diamond certification at some point in their career. Depending upon the success of the model, I’ll seek to add additional songs.

Risks

As someone who has worked in the music publishing industry for near 10 years, I feel confident in stating that the general state of accurate songwriter credits applied to recordings is poor (which bleeds into an article I wrote a few years back about Spotify’s music licensing endeavors). Given this, there are a couple large risks associated with gathering this information:

  • Genius may not have all of the necessary songwriter credits to satisfy building out this recommender. If say, I’m only able to retrieve half of the songwriter credits on the data set of Spotify titles that I’m using, that could severely hinder the performance of the model. This would be especially problematic if I was only able to retrieve songwriter details for a certain subset of music (e.g., only Pop or Rock songs).
  • Despite the fact that Genius & Spotify do a great job of providing artist and title information with every song, matching the recording and songwriter data through programmatic could result in a fairly large number of erroneous matches, making modeling and prediction impossible.

Next Steps

In the coming weeks, I’ll share some insights into how the process for building this model is coming along. Hopefully finalizing with a successful(!) model to provide accurate songwriter recommendations for a given song.

Stay tuned for my next piece, which will focus on the data gathering & engineering process from the Spotify and Genius APIs.

--

--