Beyonce’s Renaissance Album : A Twitter Sentiment Analysis

Chinonso Okonkwo
6 min readAug 12, 2022

--

When Beyonce announced the release of her Seventh Studio Album, Renaissance, I was frankly curious. Considering the massive success of her past albums, I wondered how I’d feel about this album. Would I have a personal favorite? Next, I wondered how the twitter community would feel about the album. Would they love it? and if they did, what would they love about it?

I decided to use this Project as a first, in learning Text Mining, Sentiment Analysis and Natural Language Processing. I mined tweets relating to the hashtag “Renaissance” using Twitter’s API and the Python library Tweepy for about 10 days. Tweets used in this Analysis ranges from 24th July to 2nd August 2022.

As one of the most influential artists of her time, I thought it would be nice to see:

· Which music track fans loved the most,

· What time of the day people actively tweeted about the album?

· The location with the most tweets.

· Most used music positive words.

· Common words used in tweets about Renaissance

. Twitters users sentiments.

. How the album fared generally on twitter?

PROJECT METHODOLOGY

The main steps for this project can be summarized as follows:

  • Data Wrangling (Data Gathering, Data Assessment and Data Cleaning)
  • Extracting hashtags and music tracks
  • Data Preprocessing
  • Sentiment Analysis
  • Explanatory Analysis/ Data Visualization
  • Power BI Dashboard

DATA WRANGLING

Data Wrangling is a necessary step in every data analysis Process. This process involves Gathering tweets, Assessing then Cleaning. Here, I gathered Tweets from Twitter, using the Twitter API and Python Library Tweepy. I started off by importing some necessary libraries. For the search query, I used the hashtag #Renaissance and for only Tweets in English Language. After a little investigation on Twitter, I noticed people were tweeting with a misspelled Hashtag “#Rennaissance” so I also included this hashtag in my analysis.

I gathered the data in 3 batches due to Twitter’s 7 day limitation on tweets extraction and stored gathered data into individual csv files . Then appended all as a single dataframe. I used the Twitter’s since_id and max_id to aid me in pulling both new tweets and old tweets. You can read further about it here . Next, I assessed the dataframe. I was looking out for duplicate data, missing data, incorrect datatypes. I noted down my assessments in markdown cells in Jupyter notebooks.

Here are a few data quality issues I discovered through assessment;

A screenshot of Duplicate data
Missing values in each column

Clearly there’s a duplicate value in our dataframe, Irrelevant column, Unnamed:0 and missing values in location column. I made a copy of my data before cleaning. Then fixed all data quality issues noted down in the markdown cells.

To Extract Popular Hashtags,

I used REGEX function to extract all hashtags from Tweets to a new column.

Basically, the function i used will find all instances in tweets where the word begins with (#) an hashtag and contains 1 or more alphanumeric character.

To Extract The Popular Music Track

I defined a function to convert tweet to lowercase characters, Then applied the function to the Tweet column.

I defined a function to replace track names that were more than one word with a single word by eliminating spaces using Regex. This was done so each music track is counted as one word to avoid any irregularities down the line during tokenization.

Find below the ranking of the tracks.

Ranking of Music Tracks

Extracting Positive Music Words about the Album

During assessment, I noticed some words were used by tweeters to sum up the album. Using REGEX I replaced the positive words as a single positive word. I extracted all Positive words to a new column.

Word Cloud Depicting the most Common words.

Find attached below, a word cloud which shows frequently recurring word in large size and less occuring word in a small size. The Renaissance Album cover is associated with a Horse and a Rider. So I tried to depict that with the word cloud and share insights as well.

Most Common Words used in Tweets

DATA PREPROCESSING

Data preprocessing involves all data cleaning in preparing tweets for Sentiment analysis. To do this, I created several functions which I applied to ‘tweet’ column in my dataframe to produce desired results. Using REGEX, I removed stopwords, urls, some common words which i specified and the Renaissance Track names. I reduced each tweet to lowercase, applied Tokenization to break tweets into individual words and lemmatization to remove punctuations.

SENTIMENT ANALYSIS

Sentiment analysis is a means of analyzing textual data in order to determine whether the writer’s attitude towards a particular topic, product or service is positive, negative, or neutral. I employed the use of Text Blob, a Python library to get the Polarity score. This polarity score ranges from -1 to +1, and it basically tells me if a sentiment is Positive, Negative or Neutral. For this analysis, A polarity score of ≥0 is classified as Positive, While a Polarity score <0 is classified as Negative. You can take a look at the Sentiment distribution below.

A pie chart showing distribution of Twitter Users Sentiment

EXPLANATORY ANALYSIS/ DATA VISUALIZATION

Insights

  1. The Most Popular Track off the Album is Church Girl with about 7170 mentions, closely followed by Alien Superstar with 6835 mentions
Most Popular Tracks

2. More Tweets were created in the early hours of the morning, at 5AM GMT time to be precise.

Tweets by Time of Day (GMT)

3. The Top 5 Tweeting Locations are Los Angeles CA , Atlanta GA, New York NY , United States, Houston TX.

Location of Tweets

Limitation to this analysis of location,

About 38,300 locations were unspecified. Also, Twitter users are known to use she, him, her and other non-locations. The use of geocoding to extract longitude and latitude coordinates would have sure come in handy in this part of the analysis. But I wasn’t able to employ this because I’m required to pay a fee to use Geocoding platforms for large datasets.

To further show my insights, I’ll embed below a Power BI dashboard and a video to illustrate The interactive dashboard.

A Video to show Interactiveness of dashboard

CONCLUSION

In Conclusion, I would say the Album did generally well on Twitter. 86.5% of tweets gathered between 24th July and 2nd August, expressed positive sentiments about the Album. Twitter users frequently praised the Vocals, Lyrics, Production, Beats and Harmonies of the album. They termed it a No Skip, No Shuffle Album.

Also, Tweeting about the album before its release, gave the album massive engagement. I recommend brands to leverage on social media platforms especially Twitter to promote their products and services before its release. Get people talking before the release. Anticipation is sometimes key to the success of every successful launch.

Also, the fact that Beyonce has a large social media following played its part, I’ll recommend businesses using social media influencers for promotions.

This was a fun project for me, cause I absolutely love music, who doesn’t?

Well, Thank you for taking the time to go through this! Did you feel the Renaissance? LOL

To access my codes, here’s a link to my Github

--

--

Chinonso Okonkwo

Data Analyst Student @ Udacity | Interested in Data Visualizations, Writing and Travelling the World | Loves Football and Kdrama