Davido’s Timeless Album: Twitter Sentiment and Engagement Analysis.

Nnodi Precious
10 min readJun 3, 2023

--

Davido is a popular Nigerian-American singer and songwriter whom all have admired over the years because of his songs which are mostly thrilling and sensational.

On the 31st of March 2023, Davido released his fourth album ‘Timeless’ with 17 track songs and total play duration of 49 minutes 12 seconds. Fans all over the world had been waiting for this album to drop since his last album ‘ A Better Time’ in 2020. Due to the massive reactions from people on the on-stream media, I decided to conduct analysis on this album. I wanted to understand how the Twitter community received this album and the engagements behind it.

For this analysis, I implemented knowledge of Natural Language Processing (NLP), Sentiment analysis, and Text Mining. In order for me to have done this, I studied a few materials and watched some tutorials on the aforementioned areas. The tweets were scraped with hashtags ‘#timeless’ and ‘#timelessalbum’, using Snscrape library between 31st March and 10th April 2023.

Process Methodology

I used the following steps below to carry out this analysis;

  • Data Requirement Gathering
  • Data Collection
  • Data Cleaning
  • Data Preprocessing
  • Exploratory Data and Sentiment Analysis
  • Documentation and Sharing

Data Requirement Gathering

This stage generates a list of requirements to define what this project is about and its goal. It is an important aspect that has to do with brainstorming and research as it guides one throughout the process. Through my analysis, I wanted to uncover answers to the following questions;

1. Which track did people like the most ?

2. The most popular featured artist of the album.

3. Which location were the tweets generated ?

4. What were the Twitter users’ sentiments?

5. What were the most frequent words in generated tweets?

6. What was daily trends of tweets with the duration of data collection?

Data Collection

The data was scrapped using hashtags #timeless and #timelessalbum queries, using Snscrape library on python. This process took approximately 1 hour 48 minutes for both hashtags. Before I did this , I made sure to install and import necessary libraries to avoid having errors. The prerequisites as below;

# import important libraries

import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
from collections import Counter

#import snscrape.modules.twitter as sntwitter
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
nltk.download('wordnet')


from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
nltk.download('punkt')

After gathering the tweets, I appended them a single dataframe on the Date column. Next, I checked for presence of duplicated data and deleted them to avoid chances of inaccurate or biased analysis. Moving forward, I made a copy of my dataset which was later downloaded as a csv file on my personal computer.

Due to Twitter’s restrictions and daily changes of APIs, my initial scraping codes were unable to go through after using it for 7 consecutive days. I inquired that ‘Error 404’ was the new error for those wanting to scrape and you can read more about it here . To make this project better, I used two Google colab notebooks ; the first one which contains the scraping and the second which contains the rest of my analysis.

Data Cleaning

Having created the new notebook, I imported relevant libraries and read in the dataset. Below is a quick preview of our data

Preview of data frame

The merged dataframe is comprised of 90027 tweets (rows) and 17 columns. From the information of the dataset below, I noticed that the Date column had incorrect data type. Further more, I looked into the data to identify quality issues with it.

To clean the data, I took into consideration of the following; incorrect datatypes and presence of null values, irrelevant columns, also having in mind that duplicates had already been deleted as I mentioned earlier.

I deleted the Unnamed: 0 and Track columns because they was irrelevant in my analysis. Thereafter, I changed the Date column’s data type, replaced null values in Location column to “Not Available”.

Data Preprocessing

For this stage I created regex functions to extract the data I needed for my analysis from the tweets, in regards to the questions I raised in point 1 of Process Methodology. Regex functions are used to match strings of text such as particular characters, words, or patterns of characters and extract them.

a) Extracting the most popular music tracks:

I first removed the spaces between the track names and created a function to replace the all the track names after which it was applied to the Tweet column. This was done in order to avoid issues when extracting the needed data. A new column was created thereafter using the code as shown below.

# Defining a function to replace track names as one word track name in a new column new_track

def trackNames(timeless):
replacements = [("overdem", "overdem"),("feel", "feel"), ("in the garden", "inthegarden"),("god father", "godfather"),
("unavailable", "unavailable"), ("bop", "bop") , ("e pain me", "epainme"), ("away", "away"), ("precision", "precision"),
("kante", "kante"), ("na money", "namoney"), ("juju", "juju"), ("no competition","nocompetition"), ("picasso", "picasso"),
("forthe road", "fortheroad"), ("lcnd", "lcnd"), ("champion sound", "championsound")]

for pat, repl in replacements:
timeless = re.sub(pat, repl, timeless)
return timeless

df["new_track"] = df['Tweet'].apply(trackNames)
df.head(2)

Next, I created functions to get count of every instance of the tracks in every tweet and stored to a new dataframe ‘Tracks_df’ , the code shown below.

# Store track in a list
track_list = df['ttrack'].tolist()

# Iterating over all track names and split where there is more than one track instance
track = []
for item in track_list:
item = item.split()
for i in item:
track.append(i)

# Getting unique count of all tracks
counts = Counter(track)
track_df = pd.DataFrame.from_dict(counts, orient='index').reset_index()
track_df.columns = ['Track', 'Count']
track_df.sort_values(by='Count', ascending=False, inplace=True)
print("The Total Number of Unique Tracks is: ", track_df.shape[0])
track_df

b) Extracting the mentions of featured artists in Timeless album:

Just like I did above, I removed spaces between names with more than one word and replaced all featured names using a regex function after which it was applied to the Tweet column to create a new column. Each instance of the mentioned features were counted, stored to a new data-frame and saved as a csv file.

c) Extracting the most popular hashtags:

Creating a function where REGEX was applied, I extracted the hashtags from the Tweet column. This result was saved to a data-frame and then to a csv file for further analysis.

# Defining a function to extract hashtags with REGEX

def getHashtags(tweet):
tweet = tweet.lower() #converts tweet to lower case
tweet = re.findall(r'\#\w+',tweet)
return " ".join(tweet)

# Getting Hashtags and storing in column 'hashtags'
df['hashtags'] = df['Tweet'].apply(getHashtags)
df.head(2)

d) Extracting the most tagged accounts

To extract the most tagged accounts, I applied a regex function to the Tweet column and then printed out the top 10 tags using slicer function. I noticed ‘@youtube’ was among top 10 so I removed it as I needed original user accounts of tweeters.

# getting the most tagged accounts in the tweets
mentions = df['Tweet'].str.extractall(r'(\@\w*)')[0].str.lower().value_counts()

mentions = mentions[mentions.index != '@']
mentions = mentions[mentions.index != '@youtube'] # '@youtube' is removed because we need original twitter accounts
#to get the top 10 most mentioned accounts
mentions = mentions[:10].sort_values(ascending = True)
print(mentions)

Exploratory Data and Sentiment Analysis

For the sentiment analysis, the texts of the Tweet column were first preprocessed before carrying out the analysis. It involved creating different Regex functions to remove English stop-words, emojis, repeating characters, lemmatization to remove punctuation, and applied Tokenization to split text tweet into words. I did this in the first notebook that was used to scrape tweets, which explains why there were cleaned_text, Sentiment and Polarity columns in my merged dataframe at the beginning of this documentation.

  1. Sentiment Analysis

Sentiment analysis is the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. In this project, I tried to gain an overview of the wider public sentiments of tweeters on Davido’s Timeless album.

Using the TextBlob function, I was able to get the polarity scores of the tweets. Polarity lies between -1 and 1, where -1 defines a negative sentiment and 1 defines a positive sentiment, when polarity is 0 we can say it’s neutral. We can see a visual distribution of sentiments’ polarity of the tweets below, showing 60.3% of Neutral, 6.9% Negative and 32.8% Positive sentiments.

Pie chart showing distribution of tweeters’ sentiments

2. Taking a look at the most popular tracks from Twitter within the duration of this data collection period, the bar chart below shows that Kante, Feel, Away and Unavailable are the top 4 mentioned tracks on Twitter.

While I was still conducting analysis on python, I exported the data to Microsoft Power BI for further analysis and visualization.

3. #timeless which is also the album name is the most used hashtag with a count of 53K, followed by #timelessalbum.

Most popular hashtags

4. The top-mentioned featured artists are Asake, Fave and Morravey.

Top 5 mentions of artists that featured in Timeless album

5. Tweet by Date: From the visual below, we can see that Timeless album was definitely a buzz during it’s release as at 1st April while having an average of ~3k tweets for the next 4 days, and a steady decline afterwards (31st March was not included).

Tweet by day

6. Treeversebot appears to be the most active user via the analysis in the visual below, however after checking the originality while sticking to the purpose of this project, I found out it was a bot account for Treeverse NFT, Lotsofspot2 is also not related to our analysis. Hence, I can say most active as regards Timeless Album and within the duration of data collection is RabsonLee.

Top fans

I created a word cloud to know the most frequent words in the tweets. The data used here was same I used to conduct sentiment analysis (cleaned_text column); the data was first processed in terms of removing stop words, applied Tokenization, Lemmatization to remove punctuation.

Frequently occurring words

Out of curiosity I decided to check the most liked post under the original hashtags I used to scrape. Did you expect this?

Most liked Tweet

I’ll include my interactive Power BI dashboard to further illustrate my insights here.

Insights

Based on this analysis which was carried out on data collected from March 31st to 1Oth April;

  1. Timeless album had 6.9% negative, 60.3% Neutral and 32.8% Positive sentiments.
  2. The top 5 popular tracks off the album are Kante, Feel , Away, Unavailable and Godfather.
  3. #timeless and #timelessalbum are the top 2 most popular hashtag, which are also the hashtags I used in scraping the tweets.
  4. Asake is the top mentioned artist with 1858 mentions , followed by Fave and Morravay.
  5. The peak tweets were right after the release date 1st April 2023.
  6. RabsonLee is the most active tweeter related to Timeless album.

Limitations

The ‘cleaned_text’ column which contains cleaned tweet of the main Tweet column contained 8777 null values which I didn’t drop because it would reduce the original tweets of my data. However this column was used only for the sentiment analysis and word-cloud visualization.

Also, this analysis only covers a small portion of the sample space. Therefore, it is safe to refrain from drawing conclusions about the album from these insights. I’d suggest analysis to be conducted on other social media platforms and streaming media over a broader period of time to acquire an in-depth insights of this album.

Conclusion

Despite the album’s highest peaks occurring on the first two days, the number of tweets per day decreased steadily after that. This was probably going to happen given that Davido absolutely stayed away social media for nearly 4 months prior to the album’s release on March 31. It’s possible that the peaks over the first few days can be attributed to the long-awaited anticipations from his fans.

Also, based on this analysis, this album amassed 60.3% Neutral and 32.8% Positive sentiments. It’s most likely that tweeters love and appreciate the artist (Davido) more than all the tracks of the album. This is further demonstrated by the insights from the most popular tune off the album and the most cited featured artist, where the latter does not have his track included among the top 10 mentioned tracks off the album.

Overall, the album gained a total record of 469.67 million streams across platforms like Audiomack, Applemusic, Boomplay, Youtube, Spotify in it’s first month of release.

Thanks for reading!

Relevant Links

To access my codes on Github

Power BI Dashboard

Get to know me better : LinkedIn

--

--

Nnodi Precious

Data Analyst | Business Analyst | Passionate about translating data to insights