Playlist Optimisation as a Traveling Salesperson Problem
For somebody who takes a lot of pride in their playlists, mine are pretty sub-par. Sure, I may like the songs, but the order of playback leaves a lot to be desired — Pitbull and The Veronicas certainly belong in the same playlist, but you might not want their top hits back-to-back! This is my over-engineered solution to this minor issue.
Motivation
Spotify is a fantastic service which has drastically changed how we find, listen and share music. Their tech forward approach has also allowed new features to be released that further elevate their user’s ability to interact with their music, such as the discover weekly playlist or the recently released blend tool that merges two people’s music tastes into a single playlist.
At the heart of the Spotify user experience is the playlist, which allows the user a way to catalogue and organise songs in a manner reminiscent of the cassette mixtapes of the 1980s. Without playlists the listening experience on Spotify would be a transient one, with users needing to search through the endless library of tracks to find “their” music. As such playlists provide the user with some level of ownership and control over what is “their” music in a system with no ownership and a universe of music and genres.
Given the importance of playlists and the mechanism by which new songs are added to a playlist (just slap them on the end), it is unsurprising that personal playlists can quickly become incoherent, a consequence of evolving tastes and the level of effort required to reorder and reorganise songs.
To avoid this fate for my own playlists, I have started to build solutions to this issue through using Spotify’s developer tools to create my own methods to interact with the Spotify music universe.
This is part two in a series, with links to the other parts at the bottom of this article.
Playlist Optimisation
“DJing is a complex analytical as well as creative task, creating an automatic DJ has proved to be highly nontrivial” Len Vande Veire and Tijl De Bie, EURASIP Journal on Audio, Speech, and Music Processing (2018)
User’s Spotify playlists tend to be chronologically organised by the date when a song is added, which does not typically lead to a well ordered playback. When seeking to improve this order, it helps to reframe the problem as “what order should we play the songs such that every subsequent song is acceptably similar to the previous song”.
One shortcut to determine how similar two songs is to use the Spotify audio analysis data, which includes measurements of tempo, key, as well as more subjective measures such as “danceability” and “acousticness” which capture the sound and mood of the song. If a set of these measures are stored as vectors for each song, the similarity between songs can be quantified as the distance between these vectors, with various dimensions weighted by relative importance. Two songs that are nearby as measure by their vectors should have a similar BPM, key and general mood, while songs that are further apart may be very different and so unsuitable to play back-to-back.
The task of ordering the songs of the playlist to minimise the sum of the distance between sequential songs is version of the traveling salesman problem or TSP, which seeks find a route between cities that minimise the total distance traveled. The only major difference is that instead of a two dimensional real-world space, we have an arbitrary number of dimensions in the song vectors.
Finding an exact solution to the TSP is a famously NP-hard problem, which requires an infeasible amount of computational power for larger numbers of data points. The complexity of the TSP is O(n!), which becomes impractical beyond roughly 20 data points, and even optimised versions are impractical beyond 50 data points. As we don’t know how long a user’s playlist may be and want the algorithm to run in a standard web browser, seeking an exact solution is unacceptable.
Instead an approximate solution is used, in this case a simple nearest neighbours approach is used which has a complexity of O(n squared). The algorithm can be summarised as the following steps.
- Mark all points as unseen.
- Select a point from the set, in this case the array of playlist vectors. Append the point to the output array and mark it as seen.
- While there are unseen points, find the nearest point by looping over all unseen points and measuring the distance to the previously seen point.
- Append the nearest point to the output array and mark it as seen.
- Return the output array
This greedy approach is far from optimal, but is significantly faster and simpler than competing approaches, and is well suited to this application where speed is preferred over an exact solution. An additional perk of this method is that every starting point yields a slightly different approximate solution, which allows the user to shuffle between approximately optimal solutions by simply recomputing the solution, given that the starting point is randomly chosen. When this is combined with a visualisation of the solution, it allows the user to pick the solution they like the look of best. This solution could be further improved by selecting a starting point with a desirable characteristics for the start of a playlist, such as low BPM compared to the other songs.
The tool
For those that are interested the playlist optimiser tool is available at https://www.grantholtes.com/smartshuffle