Data Science for DJing

James Camagong
The Startup
Published in
8 min readSep 19, 2020
LNY TNZ at EDC Las Vegas (Photo by Matty Adame)

Introduction

Electronic Dance Music (EDM) is a unique genre of music in that live performances rarely involve instruments or singing. Rather, DJs excite the crowd through their mixing techniques and song selection. Thanks to 1001tracklist.com and the Spotify Developer Web API, we have access to data around what songs DJs have played during their DJ sets and what audio characteristics these songs have.

In this article, I will explain how I:

  1. Scraped tracklist (list of songs played in a certain order) data from 1001tracklists.com
  2. Passed those songs through the Spotify API to receive the audio features for each track
  3. Cleaned, transformed, and analyzed the combined data to gain insights about how audio features, like tempo and key, change throughout a DJ set

Scraping the DJ set song data

1001tracklists.com is a cool website that crowdsources the sequential list of songs that DJs play at their shows. Fans go on the website and construct tracklists and provide links to the songs and to a recording of the set itself.

Screenshot of a tracklist page from 1001tracklists.com

As an EDM fan / DJ hobbyist, I stumbled across the site and saw the potential for this project. I picked 10 famous DJs and for each of them, scraped the data of their 10 latest tracklists. With the song order and song choice data, I knew I could use the Spotify API (which I had previously heard of) to analyze the DJs’ song choice technique across a set of quantitative metrics.

Note: For details on how I scraped the data, please check out my Jupyter Notebook.

Retrieving song characteristics from the Spotify API

For every song on Spotify, they have pre-calculated what they call audio features such as tempo, key, etc. (we will get deeper into these in a later section). The main output of the web scraping exercise above was to get the Spotify IDs of the songs played by each DJ so that I could run it through the API and receive this response for each song:

Example of the audio features calculated by Spotify for a specific song (screenshot from the Spotify API docs)

With these responses collected, it was easy enough to make a Pandas dataframe containing the ordered list of songs played in each DJ set.

Data Cleaning and Preparation

As is true with all data projects, I had to do some data cleaning and preparation to get the data in a nice and analyzable format.

How the data looked at the beginning of the project

In the picture above, each row represents one song played by a DJ in their set. Position represents the numerical order of when a DJ played the song in that tracklist (so position = 1 would be the first song played in the set). I quickly realized that I needed a way to equally compare all the DJ sets even though they each had a unique number of songs. Also, some songs were missing because they were not available on Spotify.

My idea was to measure the progression of each set in terms of the percent completion relative to the song position. So I divided the position of each track by the number of tracks in the tracklist to get the percent completion for each row. I then inserted values and forward filled the song data so that each tracklist had rows going from 0–100 percent completion (details on how I did this can be seen in my Jupyter Notebook).

Final dataframe used for analysis

With the data in this format, I could now plot and compare how the values for these music features changed over the course of each DJ set.

Diving into the DJ Data

After plotting the data across all the dimensions, looking at tempo showed some interesting trends. Tempo is measured here as beats per minute (BPM), the same way we measure our heart rate. This metric helps describe the speed or pace of a song. Let’s take a look at how these DJs manage tempo throughout a set.

Notice how for most DJs, the tempo hovers around 128 BPM. This is a common tempo for EDM. One of the reasons for this is because many believe that it is easy to dance at this tempo, and we can look at the danceability feature from Spotify to see if there is any truth to this.

Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

Spotify API Documentation

Below we can see the average danceability for each DJ over a set:

We see that for DJs who kept near 128 BPM (like Kaskade, Diplo, Alesso, and Zedd), the average danceability was relatively high (above 0.6 danceability is on the higher end according to Spotify’s calculated distribution for this metric). The main takeaway for an aspiring DJ: keep close to 128 BPM to keep the crowd dancing.

How a DJ picks the next song

So we know what tempo most DJs aim for throughout a set, but how do they pick their next song? Let’s start by understanding the software that a DJ uses.

Screenshot from djay Pro 2 (the DJing software I use)

There is a lot going on in the picture above but to put it simply, the pink box is where a DJ mixes songs together and the yellow box is where the DJ picks the next song. Notice how BPM and key are prominent in both the mixing and song choice. That’s because it’s hard to mix two songs that have extremely different tempos or keys without disrupting the flow of the music. We can use our data to find out how much the DJs in our sample change these features from song-to-song. Let’s look at tempo first:

Median change in tempo (BPM) from song-to-song

The box plots above show the distribution of BPM changes the DJs made in their sets from song-to-song. For example, if Zedd played a song at 126 BPM and then played a song at 128 BPM right after, we would calculate a change in tempo of 2 BPM. As you can see, most of the DJs have a median change of tempo below 6 BPM and rarely change the tempo more than 10 BPM between songs. Advice to aspiring DJs: only make small changes to BPM from song-to-song (between 0 -10 BPM).

In music theory, the key of a piece is the group of pitches, or scale, that forms the basis of a music composition in classical, Western art, and Western pop music.

Wikipedia entry for key

Key is another key consideration when a DJ picks the next song. To quantify key, Spotify uses pitch class notation which labels each key with an integer (the key of C=0, C♯/D♭=1, D=2, etc.). Each song in our data has an integer from 0–11 that denotes the song’s key. In theory, a DJ mixing two songs that are close together in key should sound more harmonic than mixing together two songs that are relatively far apart in key. We can see if our DJs follow this rule by finding the differences in pitch class integers between songs.

Median change in key (difference in pitch class integers)

We see that the median amount of keys that our DJs change between songs goes from 3–5 and that they rarely make huge changes in key from one song to the next. This concept of mixing in key is more complicated than I’m presenting here. Many types of DJ software split up the key integers into smaller groups, distinguishing between the minor and major chords. Nevertheless, a good rule of thumb for DJs looking at this data is to try and keep the songs you play in succession as close in key as possible.

Aspects of DJing not addressed by the data

Here I want to address an aspect of DJing that cannot be seen in the data presented. DJs have the ability to change the tempo and key of songs that are playing so that they mix better with the next song. For example, if I’m currently playing a song at 123 BPM in the key of C (pitch class integer of 0) and want to mix in a song that is at 131 BPM in the key of G (pitch class integer of 7), I can use my DJ controller to gradually increase the tempo of the currently playing song and also alter the key of the song so that it’s closer to the key of the next song. This is all executed through the DJ software and controller. However, trying to change the tempo and key of a song by too much can be heard and disrupt the flow of the music.

There are also many effects and edits that DJs add to the underlying songs that make transitions smoother which cannot be seen in the data. So if you were wondering how some DJs we looked at got away with huge shifts in tempo and key (*cough* Porter Robinson), this is how.

Conclusion

I hoped you enjoyed this look into DJing using data science! If you’re interested in the code, please check out my Github repo for this project. If I had more time and data, I would love to examine the trends of DJs throughout thousands of their sets and create an algorithm that emulates a DJ’s style given a list of songs to play. Also, someday we might have data around DJ transitions and effects through audio analysis. Until then, thanks for taking the time to read my article!

--

--