Top Energetic Songs

Pengtong Yang
INST414: Data Science Techniques
3 min readMay 12, 2022

I had done researches on top songs in the past 10 years, I wonder what most energetic songs in the past that will provide most energy for parties. My insight is to find out the top 10 most energetic songs from the dataset of top 100 Spotify songs from 2010–2019.

The dataset is from Kaggle: https://www.kaggle.com/datasets/muhmores/spotify-top-100-songs-of-20152019

In order to identify the similarity of the data, the similarity metric that I used is find the songs with highest beat per minute (bpm), highest energy level (nrgy), and highest song duration(dur).

The result of query 1, the top 10 songs with beat per minute(bpm) over 100 are “Honey Bee”, “FourFiveSeconds”, “BIG BANK”, “The Motto”, “Pure Water”, “Simple”, “Corazón”, “You Know You Like It”, “You Know You Like It”, and “True Love”. The songs with the most beat per minute are “Honey Bee” and “FouFiveSeconds”.

The result of query 2, the top 10 songs with energy level (nrgy) over 80 are “Riverside”, “Get Up”, “Written in the Stars”, “BURN IT DOWN”, “Bangarang”, “Bad”, “Bounce”, “Hot Right Now”, “Don’t Stop the Party”, and “Timber”. The songs with the most Energy Level are “Riverside” and “Get Up”.

The result of query 3, the top 10 songs with time duration (dur) more than 200 seconds are “Not a Bad Thing”, “Mirrors”, “Te Boté — Remix”, “Lose Yourself to Dance”, “Bohemian Rhapsody”, “m.A.A.d city”, “Somebody Else”, “Bad and Boujee”, “Holy Grail”, and “Holocene”. The song with most duration in seconds is “Not a Bad Thing”.

Libraries used in Python sqlite3, and pandas. I used slqite3 to import SQL queried to be used in Python, as I am more familiar with SQL queries. SQL data table were being turn in pandas data frame table for more cleaner visualization.

Some of bugs and problem that I encountered are:

1. The library name for pandas was import as pd, but I didn’t realize. I still attempt to use pandas in the data frame functions, which error occur. Replace pd with pandas in the function solved the problem.

2. The xlsx file cannot be read as csv file, use read as excel instead solved the problem.

3. ost variable names aren’t readable; I rename them into titles that can be read by human. For examples: I renamed bpm to BeatPerMinute, I renamed nrgy to EnergyLevel, and I renamed dur to SongDuration(sec)

The limitation of this assignment will be the ability to get delete the duplicate songs in the data set, as there are multiple songs that are the most popular songs in many years. The data set is not large enough to accurately cover all the top songs in Spotify, as these dataset aren’t collected directly from Spotify.

The takeaways are I needed to test the code multiple times to get the most accurate result that I needed. For instance, I had retrieved data for top 10 songs with beat per minute(bpm) over 100, or top 10 songs with energy level (nrgy) over 80 , or top 10 songs with time duration (dur) more than 200 seconds. They are not in ranking order from highest to lowest. I added ORDER BY (variable name) DESC

--

--