Introduction to Music Recommendation and Machine Learning

Brian Srebrenik
5 min readDec 4, 2018

One of Spotify’s most popular features is its Discover Playlist, a playlist that is generated each week based on a user’s listening habits. As a Spotify user I have found these playlists to be extremely accurate some of the time, but at times I would come away completely unsatisfied with my personalized selection. I wanted to take a closer look at how Spotify and other music companies make these recommendations.

As online music streaming becomes the dominant medium for people to listen to their favorite songs, music streaming services are now able to collect large amounts of data on the listening habits of their customers. These streaming services, like Spotify, Apple Music or Pandora, are using this data to provide recommendations to their listeners. These music recommendation systems are part of a broader class of recommender systems, which filter information to predict a user’s preferences when it comes to a certain item. Think Netflix movie recommendations or Pandora radio. This great Wikipedia article on the topic splits recommender systems into two classes, which will also apply to music specific recommender systems. These two classes, or approaches, to recommender systems are Collaborative Filtering and Content Based Filtering.

Collaborative Filtering

The collaborative filtering approach to recommendation algorithms involves collecting a “large amount of information on users’ behaviors, activities or preferences and predicting what users will like based on their similarity to other users”. A key point to be made about this method is that the item itself, or its features, that is being recommended is not being analyzed. Rather, it is making the assumption that previous information in a user’s history about how they agree with other users (for instance User A liked Movie A and User B liked Movie A, so they will have similar interests), will be predictive in determining whether or not they will enjoy a certain item. Data collection under this approach includes both explicit data collection, like asking a user to rate an item, and implicit data collection, like keeping records on how often and for how long a user views an item. One popular machine learning technique used in this sort of recommender system is the k-nearest neighbor approach. One of the major issues with the collaborative filtering approach is the so called “cold start problem”, in that the system need a large amount of data to make accurate recommendations.

Example:

Last.fm creates a “station” of recommended songs by observing what bands and individual tracks the user has listened to on a regular basis and comparing those against the listening behavior of other users. Last.fm will play tracks that do not appear in the user’s library, but are often played by other users with similar interests. As this approach leverages the behavior of users, it is an example of a collaborative filtering technique.

Content-Based Filtering

The content-based filtering approach differs from the the collaborative filtering approach as it filters based on an analysis of both the item being recommended and the user. Content-based filtering closely examines the actual item to determine which features are most important in making recommendations and how those features interact with the user’s preferences. Data collection can be much more complicated in content-based filtering as it is very difficult to select which features of an item will be important in creating some sort of predictive model (we will see that this is a major hurdle when it comes to music recommendation systems). Machine learning techniques such as naive Bayesian classifiers and cluster analysis are used to determine which features of an item can be used to classify it.

Example:

Pandora uses the properties of a song or artist (a subset of the 400 attributes provided by the Music Genome Project) to seed a “station” that plays music with similar properties. User feedback is used to refine the station’s results, deemphasizing certain attributes when a user “dislikes” a particular song and emphasizing other attributes when a user “likes” a song. This is an example of a content-based approach.

Hybrid Systems

Hybrid recommender systems simply refer to systems that use a combination of both collaborative filtering and content-based filtering.

Music Recommendation Models

Some of the best research being done in the area of music recommender systems is found in the Recommender Systems Handbook by Francesco Ricci, Lior Rokach, and Bracha Shapira. Specifically, Chapter 13 “Music Recommender Systems” by Markus Schedl, Peter Knees, Brian McFee, Dmitry Bogdanov, and Marius Kaminskas. I highly suggest reading this chapter if you are specifically interested in the subject of music recommender systems. I will summarize some of their findings below.

Content-Based Filtering

A content-based approach is often the technique relied upon the most in music recommender systems, as researchers have found that explicit ratings data is much more difficult to find in music. In the content-based filtering approach, a model will consider music data about a particular song when making a recommendation . Below I talk about the types of music data used in this approach.

Music Data?

Metadata vs. Audio Content

Two main categories of data are used in a content-based filtering model, metadata and actual audio content. Metadata refers to the annotations and tags people have placed on pieces of music. For instance, metadata could be the genre of music that users have ascribed to a song, or keywords used in reviews of music on popular websites. Audio content refers to the actual acoustic and music features of a song. This can be anything from the rhythm and beats per minute of a song to what keys and chords are played in the song.

Music Information Retrieval

Finding and determining what audio content will be important in a music recommender system is so difficult that an entire field has been created. Music information retrieval refers to scientific field of extracting information about music and its audio content based on audio signal processing and using that information to make meaningful conclusions about a piece of music and how it relates to other music.

Contextual Recommendations

When you open up your music streaming service of choice and go to select a song to play, what song you chose may be based on who else is in the room, whether you are about to go on a long run, or whether you just had a bad day. One of the main issues with both a content-based and collaborative filtering method, is that neither method takes into account the listener’s “context”. Here, context can refer to a variety of different factors including a listener’s mood, the time of day, and what the weather is like outside.

Hybrid Models

One of the more popular trends in music recommendation models are Hybrid Models. Hybrid music recommendation models will try to combine the insight gained in a content-based or collaborative filtering approach with data about a user’s context. While these hybrid models are probably necessary to produce more accurate results, a major challenge to creating these hybrid systems is the availability of data that combines music ratings with audio content and contextual information.

That’s it for this simple introduction to music recommender systems. As I mentioned before, if you would like to dig deeper into this topic, I highly suggest the research found in Music Recommender Systems.

--

--