In the Mix

What’s the secret formula behind the hottest tracks?

Jason Kibozi-Yocka
The Startup
11 min readOct 16, 2019

--

Over the years I’ve hear over and over about how “all [insert genre] music sounds the same” or how “popular music is so formulaic.” But no one has ever really shown me how this is the case. I understand how easy it is to jump to this conclusion that “everything on the radio sounds the same nowadays” when intuitively we can hear similarities between X and Y song, but it’s often a lot harder for us to really pinpoint what exactly it is about song X that is so reminiscent of song Y. These are some of the questions I sought to answer in this investigation, and here is how I went about trying to solving them.

Now, I’m no music theorist but I am an information scientist. So, I decided to approach these questions from a data-driven standpoint. Using the power of Python 3 and Jupytr Notebook, I went through the process of generating my own dataset by accessing various online APIs. The APIs I used in my process were: the Billboard Top 100, Spotify for Developers, and the Genius API.

You can access my full Jupytr notebook here.

TL;DR? No problem, just jump down to the Conclusion section.

Data Generation

API #1: Billboard Top 100

In an effort to stay contemporary and base my findings in ‘the hottest songs’ of the present moment, I decided that the best source to build a list of songs to base my analysis on was th Billboard Top 100. Because Billboard has no official, publicly accessible API, I decided to use billboard.py by Github user Allen Guo (guoguo12) to access Billboard’s song listings.

After installing the billboard.py library using: pip install billboard.py, I imported the library into my Jupytr Notebook. Next I used billboard.py’s ChartData() function to grab all the songs in the Billboard Top 100.

# libraries for working with billboard api
import billboard as bb
# retrieve billboard hot 100 data
hot100 = bb.ChartData('hot-100')

This gave me a list of the Billboard Top 100 tracks in the form of a billboard.ChartData object.

Because I wanted my Billboard data to speak with data from my other APIs, I needed it to be in the form of a dataframe. Unfortunately, pandas (the Python library which manages dataframes) has no method for converting from billboard.ChartData. So, using basic looping and list functions I turned my ChartData object into a list which I then converted to a dataframe using the pandas.DataFrame() function.

# pandas can't convert this datatype to a dataframe so...
# let's turn it into a list of lists
billList = []
for i in range(100):
song = hot100[i]
entry = [song.title,song.artist,song.weeks]
billList.append(entry)
# let's turn out list into a dataframe
billDF = pd.DataFrame(data = billList)
billDF.columns = ['Song Title','Artists','Weeks on Billboard']

API #2: Spotify for Developers

Because I wanted to break down songs to their base qualities in my efforts to analyze their similarities, I decided to use Spotify. Spotify has a publicly accessible API where they’ve done a great job measuring and tracking different song properties like: key, mode, acousticness, danceability, energy, instrumentalness, etc. So, to access this API I used a library called spotipy.

After installing spotipy using: pip install spotipy, I imported my library and used the SpotifyClientCredentials function to securely access the Spotify API.

# Libraries for working with spotify api
import spotipy as sy
from spotipy.oauth2 import SpotifyClientCredentials as sycred
# retrieve API credentials
with open('api_keys.json','r') as file:
creds = json.load(file)
# create a variable that handles requests
handler = sycred(client_id = creds['spotify']['client_id'],
client_secret = creds['spotify']['client_secret'])
sp = sy.Spotify(client_credentials_manager = handler)

Following this, I created a function that passed in all the song titles in my Billboard Top 100 dataframe and spit out the songs properties, which I added to my dataframe using the dataframe.iterrows().

# create a function to population the new columns in dataframe with the values
# from our Spotify API
def spSong(song_name):
trackID = sp.search(song_name,type='track')['tracks']['items'][1]['uri']
trackFeat = sp.audio_features(trackID)
iSong = billDF.index[billDF['Song Title'] == song_name].tolist()[0]
# Key
billDF.at[iSong,'Key'] = trackFeat[0]['key']
# Mode
billDF.at[iSong,'Mode'] = trackFeat[0]['mode']
# Acousticness
billDF.at[iSong,'Acousticness'] = trackFeat[0]['acousticness']
# Danceability
billDF.at[iSong,'Danceability'] = trackFeat[0]['danceability']
# Energy
billDF.at[iSong,'Energy'] = trackFeat[0]['energy']
# Instrumentalness
billDF.at[iSong,'Instrumentalness'] = trackFeat[0]['instrumentalness']
# Liveness
billDF.at[iSong,'Liveness'] = trackFeat[0]['liveness']
# Loudness
billDF.at[iSong,'Loudness'] = trackFeat[0]['loudness']
# Speechiness
billDF.at[iSong,'Speechiness'] = trackFeat[0]['speechiness']
# Valence
billDF.at[iSong,'Valence'] = trackFeat[0]['valence']
# Tempo
billDF.at[iSong,'Tempo'] = trackFeat[0]['tempo']
# populate my dataframe
for i, song in billDF.iterrows():
spSong(song[0])

API #3: Genius API

The final data source that I used was the publicly accessible song lyrics API provide by Genius. The reason I decided to access this API was because I wanted to generate vadar sentiment analyses for each of the songs in my dataframe.

Vadar is a great way of assessing emotional qualities in text, so I figured I could try to get the general emotional content of each song by analyzing each songs’ lyrics. In order to access the Genius API, I used the lyricsgenius library by Github user John W. Miller (johnwmillr).

After installing lyricsgenius using: pip install lyricsgenius, I imported it into my Jupytr Notebook. Following that I created two functions, one for requesting artist information from the Genius API and another for requesting song information from the Genius API.

# Libraries for working with genius api
import lyricsgenius as lg
# impliment credentials for accessing API
gpA = lg.Genius(creds['genius']['access_token'])
# create a function for requesting artists
def gpArtist(artist_name,song_amount):
artist = gpA.search_artist(artist_name, max_songs = song_amount,
sort = "title")
return artist.songs
# create function for requesting songs
def gpSong(song_name,artist_name):
song = gpA.search_song(song_name,artist_name)
return song

After this, I iterated through each song in my dataframe, grabbing the song artist name and song title, and using them to make requests to the Genius API to retrieve the songs’ lyrics, and running vadar analyses on each.

# use genius api to grab the lyrics for each song in the hot100 dataframe
for i, song in billDF.iterrows():
song_name = song[0]
song_lyrics = gpSong(song[0],song[1]).lyrics
sentiment = analyzer.polarity_scores(song_lyrics)
iSong = billDF.index[billDF['Song Title'] == song_name].tolist()[0]
# Negative Vader Score
billDF.at[iSong,'vNeg'] = sentiment['neg']
# Neutral Vader Score
billDF.at[iSong,'vNeu'] = sentiment['neu']
# Positive Vader Score
billDF.at[iSong,'vPos'] = sentiment['pos']
# Compound Vader Score
billDF.at[iSong,'vCompound'] = sentiment['compound']

After having done all that, my dataframe was complete. From there I exported it as an excel using dataframe.to_excel() for visualization in Tableau.

# export my dataframe to excel for visualization in Tableau
billDF.to_excel("API_Data.xlsx", sheet_name='API_Data')

Data Analysis

How Acoustic are Billboard Top 100 Songs?

The most acoustic song in the Billboard Top 100 is ‘HIGHEST IN THE ROOM’ by Travis Scott and the least acoustic song in the Billboard Top 100 is ‘Southbound’ by Carrie Underwood.

According to the Spotify API, acousticness is “a confidence measure from 0.0 to 1.0 of whether the track is acoustic.” That being the case, we see here in my visualization that the song ‘HIGHEST IN THE ROOM’ by Travis Scott is the most acoustic song in the Billboard Top 100 while ’Southbound’ by Carrie Underwood is the least acoustic.

My visualization also tells us that songs in the Billboard Top 100 are typically not very acoustic since our average hovers around a score of 0.2.

How Danceable are Billboard Top 100 Songs?

The most danceable song in the Billboard Top 100 is ‘Bad Bad Bad’ by Young Thug and Lil Baby and the least danceable song in the Billboard Top 100 is ‘Remember You Young’ by Thomas Rhett.

According to the Spotify API, “danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity.” That being the case, we see here in my visualization that the song ‘Bad Bad Bad’ by Young Thug and Lil Baby is the most danceable song in the Billboard Top 100 while ’Remember You Young’ by Thomas Rhett is the least danceable.

My visualization also tells us that songs in the Billboard Top 100 are typically highly danceable since our average hovers around a score of 0.7.

How Energetic are Billboard Top 100 Songs?

The most energetic song in the Billboard Top 100 is ‘Southbound’ by Carrie Underwood and the least energetic song in the Billboard Top 100 is ‘HIGHEST IN THE ROOM’ by Travis Scott.

According to the Spotify API, energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.” That being the case, we see here in my visualization that the song ‘Southbound’ by Carrie Underwood is the most energetic song in the Billboard Top 100 while ’HIGHEST IN THE ROOM’ by Travis Scott is the least energetic.

My visualization also tells us that songs in the Billboard Top 100 are typically high energy since our average hovers around a score of 0.6.

How Instrumental are Billboard Top 100 Songs?

The most instrumental song in the Billboard Top 100 is ‘Don’t Call Me Angel (Charlies’s Angels)’ by Ariana Grande, Miley Cyrus, and Lana Del Rey.

According to the Spotify API, instrumentalness “predicts whether a track contains no vocals.” That being the case, we see here in my visualization that the song ‘Don’t Call Me Angel (Charlies’s Angels)’ by Ariana Grande, Miley Cyrus, and Lana Del Rey is the most instrumental song (has the least vocals) in in the Billboard Top 100 while most everything else is not very instrumental.

My visualization also tells us that songs in the Billboard Top 100 are typically not very instrumental since our average hovers around a score of 0.0.

How Lively are Billboard Top 100 Songs?

The most lively song in the Billboard Top 100 is ‘Camelot’ by NLE Chappa and the least lively song in the Billboard Top 100 is ‘One Thing Right’ by Marshmello and Kane Brown.

According to the Spotify API, liveliness “ detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live.” That being the case, we see here in my visualization that the song ‘Camelot’ by NLE Chappa is the most likely song to have been recorded live in the Billboard Top 100 while ’One Thing Right’ by Marshmello and Kane Brown is the least likely to have been recorded live.

My visualization also tells us that songs in the Billboard Top 100 are typically in-studio recordings rather than live recordings since our average hovers around a score of 0.2.

How Loud are Billboard Top 100 Songs?

The loudest song in the Billboard Top 100 is ‘Truth Hurts’ by Lizzo and the softest song in the Billboard Top 100 is ‘HIGHEST IN THE ROOM’ by Travis Scott.

According to the Spotify API, loudness is “ the overall loudness of a track in decibels (dB).” That being the case, we see here in my visualization that the song ‘Truth Hurts’ by Lizzo is the loudest song in the Billboard Top 100 while ’HIGHEST IN THE ROOM’ by Travis Scott is the softest.

My visualization also tells us that songs in the Billboard Top 100 are typically loud since our average hovers around a score of -5 dB (note that the human range of hearing goes from 0 to around -15 for the best listeners among us).

How Vocal are Billboard Top 100 Songs?

The most vocal song in the Billboard Top 100 is ‘Even Though I’m Leaving’ by Luke Combs and the least vocal song in the Billboard Top 100 is ‘Headache Medication’ by Jon Pardi.

According to the Spotify API, “speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value.” That being the case, we see here in my visualization that the song ‘Even Though I’m Leaving’ by Luke Combsis is the most vocal song in the Billboard Top 100 while ’Headache Medication’ by Jon Pardi is the the least vocal.

My visualization also tells us that songs in the Billboard Top 100 are typically less vocal (as in exclusively vocal) since our average hovers around a score of 0.1.

At What Tempo of Billboard Top 100 Songs Set?

The Billboard Top 100 song with the highest tempo is ‘Liar’ by Camilla Cabello and the Billboard Top 100 song with the lowest tempo is ‘Potential’ by Summer Walker.

According to the Spotify API, tempo measures “ the overall estimated tempo of a track in beats per minute (BPM).” That being the case, we see here in my visualization that the song ‘Liar’ by Camilla Cabello is the highest tempo song in the Billboard Top 100 while ’Potential’ by Summer Walker is the the lowest tempo song.

My visualization also tells us that songs in the Billboard Top 100 are typically high tempo since our average hovers around a score of 120 BPM.

How Positive are Billboard Top 100 Songs?

The most positive song in the Billboard Top 100 is ‘Sucker’ by the Jonas Brothers and the least positive song in the Billboard Top 100 is ‘Liar’ by Camilla Cabello.

According to the Spotify API, valence is “a measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track.” That being the case, we see here in my visualization that the song ‘Sucker’ by the Jonas Brothers is the most positive song in the Billboard Top 100 while ’Liar’ by Camilla Cabello is the least positive.

My visualization also tells us that songs in the Billboard Top 100 aren’t particularly more positive or negative since hovers around a score of 0.5 although, there does seem to be the slightest dip towards positivity given that the average is actually a little over 0.5.

This is backed up by my vadar sentiment analysis which shows that the averages for positive to negative word scores was almost 1-to-1 with a slight dip towards the positive.

The song with the most positive words in the Billboard Top 100 is ‘Every Little Thing’ by Russel Dickerson and the song with the most negative words in the Billboard Top 100 is ‘Liar’ by Camilla Cabello.

It is especially clear in the following visualization which shows that the average sentiment value (meaning positive + negative + neutral words) is around 0.2 meaning that songs in the Billboard Top 100 lean more towards the use of positive words.

The most positive song overall in the Billboard Top 100 is Truth Hurts’ by Lizzo and the most negative song overall in the Billboard Top 100 is ‘Hot Girl Bummer’ by blackbear.

Conclusion

So, what is the formula behind the “most popular tracks” of the present? Well, it would seem that if you want to get into the Billboard Top 100 you should value low acousticness, high danceablity, high energy, low instrumentalness, high volume, low speechiness, high tempo, and don’t record live. Whether your song is positive or negative won’t really matter, but if you want a leg up then your song should use more positive words.

So, do the “hottest songs” all sound the same? I’d say no. While my data shows that there are some similar trends between songs in the Billboard Top 100, there is also a large degree of variation between tracks. In conclusion, I think we should be less worried about our favorite songs becoming increasingly formulaic, because even though a lot of the top songs follow similar conventions, there is still a large degree of creative differences that make each song unique. So relax, put some headphones on, and keep enjoying your favorite songs. Ciao!

Afterword

Before closing out this piece, I would like to point out some limitations in my analysis. Firstly, because my dataset draws primarily from the Billboard Top 100, it unfortunately inherits any biases already present in Billboard’s listing. Also, given that I only look over 100 different tracks, the scalability of my analysis is somewhat limited as well. Finally, because it is not transparent how Spotify measures song properties, we cannot full attest to the accurately of their scores, nor are we aware of any biases therein. Regardless, my hope with this article is to question the discourse around how popular music is becoming ‘overly formulaic’ and to encourage people not to discount the degree of creative and artistic differences in said music. Thanks for reading.

--

--