Sentiment of Charting Songs, What Music Should You listen to Based on Mood?

John Adjani-Aldrin
INST414: Data Science Techniques
10 min readDec 8, 2023

Introduction

In the article, “Uncovering Trends and Connections in Lyricism of Trending Songs Over the Years”, I covered the similarity between songs, artists and years based on lyrical content. In that study I made sure to export metrics such as “Degree Centrality”, “Betweenness Centrality”, “Closeness Centrality”, “Clustering Coefficient”, and a “Community” value. Of these values, the “Degree Centrality” was used heavily in order to tell us a lot of information as it pertained to common words, most important artists, similar etc. That data can be used to gather other insights as well such as analyzing the sentiment of these charting songs. This analysis dives deep into the world of music, employing data-driven techniques to decode the relationship between the sentiment of song lyrics and their network centrality measures. We explore the intricate dance of words and emotions, uncovering insights that could redefine our understanding of musical impact. This will help music listeners to make better choices of the songs they listen to depending on their mood, so by the end of this article hopefully you’ll know what songs to listen to depending on your emotional state. Libraries such as Pandas, Math, Matplotlib, Seaborn, Plotly, and Textblob were primarily used in this analysis for visualizing the data, reading and writing files, and generating the necessary values pertaining to sentiment.

Data Collection

In this analysis the data includes songs that range from 2010 to 2022 inside a csv file. The lyric and song data was gathered using the Spotify and Genius APIs and were into the “all_songs_data_metrics” csv to reduce effort of re-fetching the data if the notebook didn’t save it. The initial data collection process was relatively simple. Since the data, in regard to the various metrics, was already made available in the csv file, all that needed to be done was a successful read of the file contents.

The file “all_songs_data_with_metrics.csv” offers a complete view of the various metrics retrieved from the previous analysis.

df = pd.read_csv('all_songs_data_with_metrics.csv')
print(df.columns)

The “analyze_sentiment” function below was used to gather information from the file in regard to lyrical sentiment. Essentially sentiment is the emotional tone or attitude that is expressed which can be seen as positive negative or neutral value through the polarity score. The score for polarity would range from -1 as most negative, 0 as neutral, and 1 as most positive. The subjectivity of the sentiment is also measured with a score that shows how subject or objective a statement is on a range from 0 to 1, with the score of 0 being most objective and the score of 1 being highly subjective.

def analyze_sentiment(lyrics):
return TextBlob(lyrics).sentiment

# Apply sentiment analysis to the processed lyrics
df['sentiment'] = df['processed_lyrics'].apply(analyze_sentiment)
df['polarity'] = df['sentiment'].apply(lambda x: x.polarity)
df['subjectivity'] = df['sentiment'].apply(lambda x: x.subjectivity)

Visualizing the Data

The first main visual was a scatterplot of polarity by degree centrality. The plot gave a good view of the distribution of most important songs and their sentiment but could only give a broad view of the data.

Alternatively, an interactive plot was made using Plotly and is provided be in the GitHub link for viewers that are interested in gaining more insight into this data. The plot showed the distribution of the data by its polarity and subjectivity, additionally displaying the details such as title and artists for each node along with the accompanying sentiment metrics.

It can be generalized that there is a trend of artists having a more positive sentiment in their songs based on the graph; however, this is somewhat negligible seeing that the distribution looks relatively even in the plot. Most points remain within the 0.2 and -0.2 range of polarity, with subjectivity between the ranges of 0.4 and 0.6. This visualization shows us that most artists lyricism lies along a neutral range, both in their subjectivity and their polarity.

Diving Deeper

Though we were able to get a solid visual of the data to better understand the distribution, lets look deeper into the actual values that are present as it pertains to the top positive songs, and most negative songs.

Given our data frame of songs we can view the top positive songs with the code below and print the results. Prior to this I ensured the output did not have the full-fledged processed lyrics, in order to limit the amount of text and to properly hone in on the desired metrics. Our columns: sentiment, polarity, and subjectivity we’re also retrieved but below I show you the main metric, the polarity.

We are then left with this part of the output:

                 Title  Year          Artist  polarity  \
11361 Spring Day 2020 BTS 1.00000
10810 A Holly Jolly Christmas 2019 Burl Ives 1.00000
10817 Holly Jolly Christmas 2019 Michael Bublé 1.00000
9564 Still Life 2017 BIGBANG 0.80000
7903 Melodies 2015 Hot Shade 0.77500
9565 Illusion 2017 aespa 0.75625
2988 Hall of Fame (feat. will.i.am) 2011 The Script 0.75000
2989 Hall of Fame (feat. will.i.am) 2012 The Script 0.75000
2990 Hall of Fame (feat. will.i.am) 2012 The Script 0.75000
2991 Hall of Fame (feat. will.i.am) 2013 The Script 0.75000

So, we see that these are the most positive songs based on the textual analysis that was completed. BTS tops this list with their song “Spring Day”, but their song also ties with Burl Ives and Michael Bublé’s individual renditions of “Holly Jolly Christmas”.

The same process above was also applied to find the negative songs as well:

        Title  Year     Artist  polarity   \
9910 Callaita 2018 Bad Bunny -0.7
9911 Callaita 2019 Bad Bunny -0.7
9912 Callaita 2020 Bad Bunny -0.7
10399 DÁKITI 2020 Bad Bunny -0.7
10402 DÁKITI 2022 Bad Bunny -0.7
10398 DÁKITI 2019 Bad Bunny -0.7
4597 DÁKITI 2022 Bad Bunny -0.7
10397 DÁKITI 2019 Bad Bunny -0.7
4596 DÁKITI 2021 Bad Bunny -0.7
10400 DÁKITI 2020 Bad Bunny -0.7

Based on the calculations that were done it presumes that Bad Bunny has the most negative songs on the list. Initially I had thought that this could be a result of the language being Spanish but given BTS is in Korean and has the most positive song, the language barrier doesn’t seem to skew any of the insights found.

Now we know the songs that are likely to get you in a good mood, as well as the songs that are likely to set the tone on more negative themes. Let's say that we wanted to look at artists more closely, some adjustments would need to be made to the data frame to accomplish this. The script below shows the process of creating a new data frame that eliminates repeat artist values for both scenarios. The code block below was for creating the positive artists data frame and was the underlying blueprint for the negative artists data frame.

selected_columns = ['Spotify_ID', 'Title', 'Year', 'Artist', 'polarity', 'subjectivity', 'sentiment']
new_df = df[selected_columns]

# Sort from most positive to most negative
new_df_sorted = new_df.sort_values(by='polarity', ascending=False)

pos_artists_df = new_df_sorted.drop_duplicates(subset='Artist')

The same principle was applied in creating the negative data frame. Below is an abbreviated view of the outputs.

First, the 15 most positive artists.

Artist  polarity  \
BTS 1.000000
Burl Ives 1.000000
Michael Bublé 1.000000
BIGBANG 0.800000
Hot Shade 0.775000
aespa 0.756250
The Script 0.750000
Avicii 0.700000
Jared Benjamin 0.700000
Hooja 0.700000
Far East Movement 0.678836
Sam Kim 0.666667
Pedro Capó 0.666667
Jagwar Twin 0.663889
David Guetta 0.659091

And here are the 15 most negative artists.

Artist  polarity \
Bad Bunny -0.700000
Taylor Swift -0.700000
WINNER -0.700000
Marwa Loud -0.700000
DJ Snake -0.670000
Garmiani -0.646257
Raggarligan -0.600000
Fuerza Regida -0.600000
TOMORROW X TOGETHER -0.600000
Grupo Frontera -0.540000
Lunay -0.540000
Kanii -0.504321
BLUEM -0.500000
LATIN MAFIA -0.500000
Klevi -0.500000

Results and Insights:

The analysis of song sentiments from 2010 to 2022 has yielded fascinating results, providing a deeper understanding of the emotional landscape in popular music. By examining the polarity of lyrics, we were able to gain insights into the emotional tone of songs and artists, which can guide listeners in their music choices based on their mood.

Song Trends

The data reveals that songs like “Spring Day” by BTS, “A Holly Jolly Christmas” by Burl Ives, and “Holly Jolly Christmas” by Michael Bublé top the list with a perfect polarity score of 1.00000, indicating a highly positive sentiment. Other songs like “Still Life” by BIGBANG and “Melodies” by Hot Shade also feature high on this list, showcasing a trend towards uplifting and positive lyrical content in these tracks. You also get a small hint of seasonal song trends seeing a Christmas song appear twice in the top 5 which could be suggestive of the idea that certain sentiment can be expressed in different seasons.

Conversely, Bad Bunny’s “Callaita” and “DÁKITI” consistently show a polarity of -0.7 across various years, marking them as some of the most negative songs in the dataset. This suggests a consistent theme of negative sentiment in Bad Bunny’s music during this period.

Artists Trends

Artists like BTS, Burl Ives, and Michael Bublé lead as the most positive artists, with their songs consistently showing high positivity in lyrics. This is followed by artists like BIGBANG, Hot Shade, and aespa. The Script, with their song “Hall of Fame,” also appears multiple times, indicating a strong positive sentiment in their music.

Bad Bunny emerges as the artist with the most negative songs, followed by artists like Taylor Swift, WINNER, and Marwa Loud. This list also includes DJ Snake, Garmiani, and Raggarligan, indicating a trend towards more negative or emotionally intense lyrical content.

Based on these findings, music listeners now have an idea, and at the very least a suggestive list, of who to listen to whether your sad and want to be happy, or if you feel down and want to continue sulking. The top ten artists and songs for each side of the spectrum offer a distinguishing view of the lyrical tones of these songs and the artists behind them.

Challenges

The journey of analyzing and visualizing the sentiment data from popular songs presented several challenges. Initially, we attempted to use a heatmap to understand the correlation between sentiments and network metrics. However, this method fell short in providing clear insights into individual songs and artists, as the heatmap’s aggregated nature obscured specific details.

We then explored the use of a radar chart, hoping it would offer a more detailed view. Unfortunately, this too proved ineffective. The radar chart’s complexity made it difficult to draw meaningful conclusions, especially for those not well-versed in data interpretation.

Another significant challenge was data management. As the analysis progressed, we created multiple versions of the initial dataset, each tailored to different aspects of our study. Managing these variations required meticulous organization to avoid confusion and ensure data integrity. This aspect of the project tested our discipline in maintaining a structured approach to data handling, underscoring the importance of rigorous data management practices in research.

Limitations and Bias

This study, while insightful, is not without its limitations and potential biases. One key limitation is the reliance on a single song to represent an artist’s overall sentiment in the top positive or negative artists, data frame. This approach can skew the perceived overall sentiment of an artist’s work, as it doesn’t account for the variability and range of emotions expressed across different songs. A more comprehensive analysis would include multiple tracks from each artist, offering a fuller picture of their emotional expression through lyrics.

Additionally, our focus on charting songs introduces a selection bias. Chart-topping tracks often adhere to certain trends and styles that may not represent the broader music landscape. Furthermore, the study’s methodology, centered on lyrical sentiment analysis, may not fully capture the complexity of emotional expression in music. Lyrics are just one component of a song’s emotional impact, with factors like melody, rhythm, and production also playing crucial roles. Therefore, our findings should be interpreted with an understanding of these inherent limitations.

In future studies, addressing these limitations by incorporating a wider range of songs, analyzing multiple tracks per artist, and considering other musical elements could provide a more holistic view of the emotional landscape in music. This approach would help mitigate biases and offer a more inclusive understanding of musical sentiment.

Conclusion

In conclusion, this study offers a unique perspective on how the sentiment of song lyrics correlates with their centrality in the music network. By analyzing a dataset spanning from 2010 to 2022, we’ve uncovered fascinating trends in the emotional tone of popular music. The findings suggest that while there is a general trend towards neutral sentiment in lyrics, there are notable exceptions, with artists like BTS and Bad Bunny representing the extremes of positive and negative sentiment, respectively.

The insights gained from this analysis are invaluable for music enthusiasts and industry professionals alike. For listeners, understanding the sentiment behind the songs can enhance their music experience, allowing them to select songs that resonate with their current mood. For industry professionals, these insights can inform marketing strategies and artist development.

However, it’s important to acknowledge the limitations of this study. The focus on charting songs means that the dataset may not fully represent the vast diversity of music available. Additionally, the method of analyzing only one song per artist for sentiment analysis may not capture the full spectrum of an artist’s lyrical tone.

Future studies could expand on this work by including a broader range of songs and artists, as well as analyzing multiple tracks from each artist to gain a more comprehensive understanding of their lyrical sentiment. Moreover, incorporating other factors such as genre, cultural context, and listener demographics could provide a more nuanced view of the relationship between song sentiment and popularity.

In essence, this study is a step towards a deeper understanding of the emotional landscape of popular music. It highlights the power of data-driven analysis in uncovering the hidden connections between lyrics, sentiment, and musical impact, opening new avenues for exploration in the ever-evolving world of music.

Source Code

Feel free to view the GitHub code below, the interactive plot is there as well for those that want to get a closer look:

--

--