Data Analysis on Billboard’s Music Hot 100 Chart of 2000

Christopher Gray
4 min readMay 8, 2017

--

This past week I was given the opportunity to analyze the songs that hit Billboard’s Hot 100 chart each week in the year 2000. Interestingly this dataset followed each song until they fell off the Hot 100 chart, which meant some songs stayed on the list for over a year! My goal of analyzing this dataset was to figure out the best ways to figure out what made the mold of the top songs and what factors established this. The dataset was comprised of 316 and included a span of 76 weeks although no song lasted that long. After cleaning up the data I found 1 song lasted 65 weeks, which was “Higher” by Creed and I deleted the unneeded columns. I then thought of different metrics to figure out how I would title which songs would be “the best” although I later found there were several issues with the data itself, but we will get into that at a later point.

I first decided to divide all the songs into genres and produced the following pie chart:

As you can see the top genre was Rock followed by Country, Rap, and so on. This lead me to believe the top songs belong to these three genres with Rock taking around 40% of the songs.

Next I decided to see if there were any artists who released songs consistently hitting the Hot 100 chart. I went ahead and made a bar chart including any artist who released atleast 3 songs, which made it on the Billboard Hot 100 in 2000 and produced these results:

This bar chart shows us 21 artists released atleast 3 songs that made it to the Hot 100 in 2000, however as you can see many of the artists shown here would be those who many would consider as the Pop genre and barely any in the Rock genre are represented. Well this shows the first issue with the dataset. Many of the songs listed in the Rock genre (ie. Whitney Houston, Christina Aguilera, Destiny’s Child, ext.) are those who many would consider as Pop. This observation makes me believe the Pop genre had a significantly higher representation than the pie chart shows and the Rock genre has one which is quite smaller.

Lastly I took into consideration the length of the songs themselves. I had a hunch the songs had a “sweet spot” per say of length. Arranging the songs in length order showed a range of 2 minutes 36 seconds and a whopping 7 minutes 50 seconds (the one Jazz song). I tallied the times looking to see the lengths tallying atleast 4 songs each and received the following list:

3:50    9
3:55 7
3:54 7
3:52 6
4:06 6
3:46 6
3:51 6
4:02 6
4:00 6
3:30 6
4:12 6
4:17 6
3:45 6
3:56 5
4:10 5
4:07 5
4:18 5
3:19 5
3:48 5
4:16 5
3:23 5
4:23 5
3:40 4
3:43 4
3:44 4
4:04 4
4:30 4
4:05 4
4:13 4

Looking at the above results we can see a “sweet spot” range from 3 minutes 19 seconds to 4 minutes 30 seconds and a super “sweet spot” at 3 minutes and 50 seconds. This brings one to believe the best songs are around 3 minutes and 50 seconds long, however this leads to the second issue I found with this dataset. After researching the songs provided I found all the legths I researched were either short or long by several seconds.

Analyzing the dataset itself one would consider a top song in either the Rock, Country, or Rap genres, released by one of those 21 artists above, and having a length of around 3 minutes 50 seconds. Unfortunately as I saw there were many errors in this dataset and as a result we can’t surmise a correct response of what makes the best songs. If given more time I would’ve loved to correct the dataset itself; changing genres, correcting lengths, and making sure each song was ranked properly each week. Also I would’ve looked at the average time in weeks that songs peaked on the chart and also average times of when songs started falling off. The Billboard dataset gave me an important lesson that I will take into my future as a Data Scientist: your analysis is only as good as the data you’re provided with.

--

--