Finding the next Billboard #1 — Spotify API Exploratory Analysis

Sudharshan
Analytics Vidhya
Published in
7 min readFeb 23, 2020

Our goal

Analytics and Data Science has been changing the music industry in ways never seen before, through the areas of Music Discovery, Generation, Recommendation Engines. Streaming platforms like Spotify collect a whole array of information on music listening, and synthesize information about songs and their properties.

Given a good deal of information, can we predict what the next big song looks like?

In Part 1, I’ll dive into collecting the data we require, and perform some exploratory analysis about what the decade looked like in terms of music.

Or let’s maybe call it, The Decade Wrapped

You can find the code for this on my website https://sudharshan-ashok.github.io

Collecting the Data we need

The Billboard #1 Track

The Billboard Hot 100 is the music industry standard record chart in the United States for songs, published weekly by Billboard magazine.

Factors that are used to calculate the Billboard Top Tracks

  1. Sales (Physical and Digital)
  2. Radio Play
  3. Online Streaming Data

I scraped this data from Wikipedia’s list of Billboard #1 Tracks from 2010–2019 https://en.wikipedia.org/wiki/Billboard_Hot_100

Billboard #1 Tracks across the last decade

The Spotify Magic

Spotify has been a leader in enabling discovery of new music. The company uses audio analysis models to extract features about the song — how danceable it is, how energetic it is, among other things. They use these features to power their product features and robustly predict what songs a person is more likely to love.

Lucky for us, Spotify gives access to their API here https://developer.spotify.com/

API Calls

Spotipy is a sweet Python package that makes it easy to connect to the Spotify API and access their rich information.

pip install spotipy

We’ll be primarily using two Spotify API endpoints

  1. Search — takes in a search string and outputs matching songs

2. Audio Features — takes in a song URL and outputs the audio features of track

Audio Features of Billboard #1 Tracks
Description of a few Audio Features

Data Munging

Once we collected the Billboard #1 data for 2010–2019, we merged it with the Audio Features information from Spotify API and cleaned it up

  • Merging Billboard data and Spotify API data
  • Removing Null Values (Songs that Spotify API Search could not fin)
  • Engineering Time Series Features (year, season, month, etc)

Exploratory Analysis

The data we now have is super rich with information. We can do some interesting analyses with these features, to understand better what makes up a #1 Song.

We have 116 songs to analyze to begin with. We can see 2010 and 2019 offered us a lot of diversity as a high number of tracks competed for the #1 slot.

Yearly — No. of #1 Tracks

Artist Collaborations

We can see that most of the Billboard #1 Tracks were Solo songs. A significant number of Duets made it to the top of the chart, but beyond 2, we see very marginal returns

Top Artists on the Billboard #1

We can see here Katy Perry had a whopping 7 tracks on Billboard #1, followed by a host of other artists (Maroon 5, Justin Bieber, Bruno Mars, Adele, etc) with 4 tracks.

Katy Perry — really the Queen of the decade?

We can see here that Katy Perry had 7 tracks in a short span between 2010 and 2013, but hasn’t landed the #1 in the rest of the decade. More like Queen of 2010–2013.

You could make a claim that Katy Perry hasn’t stayed relevant in the perspective of #1 Songs

Artists — Staying Relevant

Let’s try to find out across what timeline did an artist deliver #1 songs, in addition to the number of #1 hits.

timeline=df.groupby(‘artist’)[‘year’].max() — df.groupby(‘artist’)[‘year’].min()

We can see here Katy Perry has the highest number of Top tracks, but only over three years in the Billboard charts.

Lady Gaga tops the bar here, with two Billboard #1 songs across the span of 8 years. But Maroon 5 and Bruno Mars have significantly more Top Songs (4 each) spread over 7 years. We could claim that Bruno Mars and Maroon 5 have been super reliable in rendering top of the charts songs

New artists like Post Malone and Camilla Cabello have stayed on the Billboard #1 for fewer years, naturally. But in a short span, they’ve secured 3 top tracks meaning they’re on the rise

Songs and Presidential Elections

Valence measures how happy the song is on a scale of 0 to 1.

Presidential Election cycles always yield interesting results when analyzing time series. Here we can see that the 2016 Presidential Election, with a sharp drop, saw people listening to more sad music than ever. I’ll leave the interpretation of the results to you.

People listen to more happy tracks during Winter

The data seems to be pointing to the idea that people like listening to more happy songs during Winter. This seems counter-intuitive, but from a psychological standpoint people are more likely to be sad during Winter.

People listen to Happy Songs more in Winter

Music can often function as an antidote — a happy song on a bad day can really elevate the mood. I have been looping through Feels by Calvin Harris to pump myself up and avoid the perils of the Minnesotan Winter.

More intuitively, people love listening to more ‘danceable’ songs in Summer

Track Loudness

There’s been a strong movement back towards music that is of lower loudness, with a -1.5 decibels drop over the decade. The previous decades saw artists competing with each other on the loudness scale, so as to be more poppy on Radio Channels, which was commonly referred to as The Loudness Wars. Songs that are mastered to be loud, often have poor dynamics and are not great for enjoying on good audio gear.

In addition, songs are now mastered to be played on YouTube, Apple Music and Spotify — all of them impose loudness thresholds to optimize audio quality. (Read more here https://productionadvice.co.uk/youtube-loudness/)

Thanks to the de-incentivizing, and an increasing preference for audio quality by consumers — tracks are becoming less loud.

Drop in Energy

Spotify defines Energy as how bright and fast a song is — songs that have a lot of high frequencies — think of blaring synths, or simply singers shreaking (yikes!) — burn up pretty high on the energy spectrum.

We can see that Energy has been steadily dropping over the decade by over 25%, and has been becoming a non issue for Music Producers. Parallelly, we can see that Acoustic tracks have seen a 100% lift across the decade. People are definitely liking gentler music now

Aren’t danceable songs supposed to be energetic?

Danceability has seen a significant upward trend across the decade, despite the Energy on a decline. People like to dance to tracks that are less energetic? How do we reconcile the differences

Let’s look at tracks on both ends of the spectrum -

2010–11: Teenage Dream by Katy Perry and Rude Boy by Rihanna ruled the charts. These are high-energy tracks, with bright frequencies dominating.

2018–19: Old Town Road by Lil Nas X, Sucker by Jonas Brothers, Without Me by Halsey are all very warm and dark tracks (if one could visualize music) — but they are utterly groovy.

There was an association previously that energetic tracks are more danceable, but this decade has been prioritizing groovy and funky above track energy.

Part 2 Coming Soon

In the next part, we will get into the forecasting aspect of the problem — What will the next Billboard song look like. Stay tuned to this blog for more updates!

You can find the code for this on my website https://sudharshan-ashok.github.io

--

--