Analyzing Most Popular Spotify Artists Using Data Science

Using Pandas to analyze Spotify Top 200 lists over the last 5 years

Nathan Lang
The Startup
8 min readMay 16, 2020

--

Overview

Unfortunately, due to COVID-19, many of the activities that involve listening to music have been halted, such as exercising or commuting to work. No more window-down cruisin’ with the music blasting or attending music concerts/festivals. With that said, it doesn’t mean we can’t appreciate the success of the industry’s top artists. With my findings, I will be determining the most popular and most consistent artists, along with identifying the rising stars and let downs.

As anyone who has a passion for music and data science, I figured there is no better time to consider who has been at the top of the charts consistently over the last five years. I’m interested in exploring this question to determine what dictates popularity and consistency. In short, it could be total amount of listens or total amount of songs, but in this article I propose using the h-index to determine the top artists.

The h-index is usually used to express productivity and citation impact of scientists’ publications. For my question, I will be using the h-index to express the amount of songs an artist has with a h amount of one million listens or more. In simple terms, if an artist has an h-index of 10, that means they have 10 songs with at least 10 million listens or more.

Side note: if you are not interested in the data science, but only the results feel free to skip ahead.

Data Collection

Spotify releases daily and aggregated weekly Top 200 charts at spotifycharts.com/regional. Unfortunately, this only goes back to the start of 2017. This is where kworb.net comes in. I have spoken to Kworb on twitter, and they have been scraping the data since August 10th, 2014. So, the dataset I used is the aggregated U.S. Daily Top 200 charts from that date through May 14, 2020. It is important to note that these totals do not include the time spent off the chart.

Pandas provides a nice, easy way to read HTML tables.

Head of Dataframe

Data Cleaning and Processing

As any data scientist will tell you, data cleaning is the most important (yet tedious) process of any project, regardless of project size. Luckily, pandas interpreted the dataframe rather well, limiting the amount of cleaning that was needed.

While there are many interesting questions that can be explored with this dataset, the only parts I am interested in are the artist, amount of songs that were on the Top 200 list during this timeframe, and total amount of streams while on the Top 200 list.

I now create a new pandas dataframe artist_stats that will only display artist, song title, and streams. Since I will be using the streams in millions, I divide and round the total streams by one million.

With simple pandas manipulations, I am able to get a clean dataset consisting of only the data I want.

Head of Cleaned Dataframe

Given this new dataframe, I am now able to calculate the h-index by using a groupby transform and a lambda function. This single line of code is very powerful. First, it groups by artists and then it calculates how many songs have more streams than their rank by streams.

Finally, I will use pandas’ awesome groupby-aggregate function to get the final dataframe that I will analyze. I am then able to obtain total streams, song count, the h-index, and the h-index average for each artist. The h-index average is the percentage of songs that are attributed to the h-index for an artist.

Top 15 Sorted By the h-index

All of the code I used for this project can be viewed on my github here.

The Fun Part (Results)

Now that I have a clean dataset, let’s do some analyzing. Before diving in, I want to clarify that this dataset only consists of songs and their streams while on the Spotify’s Top 200 Daily U.S. chart. Also, this list does not take into account when people are featured on a song. Only the artist who released the song is accredited the statistics.

Most Popular

It is clear that Drake and Post Malone are huge front runners in the popularity contest. There is no surprise with Drake being up top as he entered this timeframe already very popular. Drake also debuted 152 songs in the last five years on Top 200, while Post Malone only had a third at 50 songs. I am actually more impressed with Post Malone because his first album Stoney was only released in December 2016. So, that means all the other artists, including Drake, had over two years to produce popular music. Talk about execution! Post Malone did top the charts with the most popular song “Rockstar” and the third most popular song with “Sunflower”. “Rockstar” was #1 on the charts for a whopping 124 days straight, the longest of any song during this time!

Drake and Post Malone both had over 6 billion stream listens, while the next highest was Ariana Grande at 2.4 billion which is only 40% of those two. Drake is clearly the most consistent artist of our time with his first studio album, Thank, Me Later, was released in 2010, four years prior to this dataset. Drake has undoubtedly been dominant over the last ten years. With that said, I am going to give most popular award to Post Malone.

Post Malone and Drake: source

Rising Stars

The most impressive rising star for me is Billie Eilish. While she is only 18 years old, she has produced plenty of popular songs. She has an h-index of 20 with only 24 songs that broke into the Top 200. For those who may have glanced over the h-index meaning, that means Billie has 20 songs with at least 20 million listens. Now this is nothing less than amazing as that means only four of her songs that made it to the Top 200 have not blown up. Billie’s most popular song is “bad guy”.

Billie’s second most popular song is “lovely” which actually features my next rising star: Khalid. Similar to Billie, Khalid is young at 22 years old and has featured 36 songs on the Top 200, with an h-index of 17. Khalid is currently known and recognized by his two most popular songs “Location” and “Young, Dumb & Broke”.

Billie Eilish and Khalid: source

My last rising star maybe a little controversial to fit into this category: Travis Scott. This is a tough call putting him in this category due to his popularity and presence already, but I am putting him here strictly because I think he has the potential to be the next Drake. Yes, that is a bold statement. But, he has only had 32 songs on the Top 200 in last five years making him the second lowest behind Billie. Out of these 32, 53% have over 17 million streams with “SICKO MODE” having 475 million and “goosebumps” having 472 million. Only Drake and Post Malone have two songs with over 450 million streams while on the Top 200. Although it is important to note that Drake is featured on “SICKO MODE”.

Travis Scott: source

Let Downs

There are only two people with over 75 songs that appeared on the Top 200 beside Drake. These two people are also the only people with an h-index of over 10 with an h-index average of under 15%. They are Future and Logic.

Future has debuted 136 songs that made it onto this list, while only 16 of them had over 16 million streams. Future is also a let down because, out of his top five streamed songs, 4 out of 5 had a feature on it and 2 of them were Drake.

Future: source

Similarly, Logic had 87 songs on this list and only 11 had 11 million streams or more. Logic only had one song with over 75 million streams and it was “1–800–273–8255” which had 320 million. He is as close to a popular one-hit-wonder as anyone.

Logic: source

Honorable Mentions (R.I.P.)

Juice WRLD and XXXTENTACION were two of my personal favorites. Both had bright futures in the music industry and unfortunately passed away at young ages. Both Juice WRLD (21) and XXX (20) were rising stars that took the music industry by storm with their new genre emo-rap. Juice WRLD’s “Lucid Dreams” had 547 million streams and he had 7 other songs with over 100 million streams. Similarly, XXX had “SAD!” with 518 million streams and 6 other songs with over 100 millions streams (one at 99 million). It is without a doubt, that both Juice WRLD and XXXTENTACION would have had continued success in the industry.

XXXTENTACION and Juice WRLD: source

Thank you all for reading! Feel free to let me know in the comments if anything surprised you or if there are any other questions about this dataset that interest you. :)

--

--

Nathan Lang
The Startup

Data Scientist. Computer Engineer. Passionate about Machine Learning.