Sentiment Analysis of Tweets About Gabapentinoids
Welcome to my first NLP project. It’s the result of a long and chaotic process that includes two different methods for gathering data (Twint and Snscrape) and six different methods to perform Sentiment Analysis (TextBlob, VADER, Flair, Transformers, Affin and a model trained in the Sentiment140 Dataset). The tweets analyzed are about a subject that has affected me for years (actually, my brain and central nervous system are still healing), but I’ll try to keep it objective and professional.
This project has two parts, or two different versions if you prefer. First, I used Twint to scrape the tweets and TextBlob, Vader and Flair to perform sentiment analysis. I didn’t write an article then because I only wanted to include a small NLP project on my Github profile. Here is the link to the repository.
In this article, I want to focus on the second part (or version) of the project. After reading this medium post written by Dhilip Subramanian that talks about Snscrape and HuggingFace Transformers, I decided to try both.
First, I looked for tweets that contain the term “pregabalin”. This is the code that I used after installing e importing the necessary libraries (the full code is in my GitHub repository):
#Creating dataframe called 'data' and storing the tweets 'Pregabalin' written in English.data = pd.DataFrame(itertools.islice(sntwitter.TwitterSearchScraper(
"pregabalin lang:en").get_items(), 50000))
end_time = datetime.now()
I was surprised at how fast and effective snscrape is, I had a lot of problems using Twint and the number of tweets retrieved was quite low. After saving the data in a CSV file, I repeated the process with the terms ’gabapentin’, ‘gabapentinoids’, ‘Neurontin’, ‘gralise’ and ‘horizant’ and put together all the tweets. If you want to know why I didn’t use the term ‘Lyrica’ too, this Twitter bot is just one of the reasons.
Sentiment Analysis with Hugging Face Transformers
This part of the project was the most complicated. Not because of the code, but because I wanted to analyze the more than 122000 tweets that I gathered. First I tried to use my laptop, but I didn’t have enough capacity. Then I decided to use Gradient, which offers free and paid CPU’s and GPU’s of different capacities. I cleaned the data and I run the code, but I got error messages all the time. After trying different alternatives like Amazon Comprehend or IBM Watson Natural Language Understanding and almost going crazy, I decided to try again without cleaning the data first. Bingo!! this is the code that I used to add three more columns to the dataset: ‘sentiment’, ‘label’ and ‘score’.
df = (
.assign(sentiment = lambda x: x['content'].apply(lambda s: sentiment_classifier(s)))
label = lambda x: x['sentiment'].apply(lambda s: (s['label'])),
score = lambda x: x['sentiment'].apply(lambda s: (s['score']))
Another pro tip for beginners: save the dataset in a pickle file. I had problems with the CSV file and I had to repeat the process one more time.
Sentiment Analysis with Afinn
After a long hiatus due to the health problems caused by that neurotoxic poison called Gabapentin, I felt strong enough to keep working on the project. I found out about SentiWordNet thanks to this blog on Analytics Vidhya, but I wanted to know more about this method. That’s how I found this other post that compares the accuracy of SentiWordNet, VADER and Affin. Affin looks like the simplest and most accurate method, so I decided to use it in the dataset after removing duplicates and cleaning the mentions. Here’s the code:
from afinn import Afinnafn = Afinn(emoticons=True)scores = [afn.score(content) for content in df.content]sentiment = ['positive' if score > 0 else 'negative' if score < 0 else 'neutral' for score in scores]df['af_scores'] = scoresdf['af_sentiment'] = sentiment
After performing the sentiment analysis, the dataset looks like this:
Data Cleaning and Preprocessing
Why do I need to clean the data if I’ve already performed the analysis? Because, while looking for more information about Affin, I came across the Sentiment140 dataset and I decided to use it to train a model and fit it on my dataset.
These are the steps that I followed:
- Extract the lengths of the tweets for posterior analysis
- Change the ‘date’ column to DateTime format and set it as the index.
- Keep only the tweets that contain the terms ‘gabapentinoids’, ‘gabapentin’, ‘neurontin’, ‘lyrica’, ‘pregabalin’, ‘gralise’, and ‘horizant’.
- Extract hashtags for posterior analysis.
- Removing URLs
- Replacing emoticons
- Removing non-alphanumeric characters
- Removing consecutive letters.
- Removing strings with just one alphanumeric character
- Removing stopwords
This is the result:
- Length of the tweets before and after cleaning:
We can see that most tweets have 140 characters or less, but we also have almost 8000 that are 280 characters long. These most likely were written after Twitter doubled the character count in 2018.
- Let’s see what the yearly boxplots look like:
This confirms that the longest tweets were written after 2017. We also can see that the first tweets were written in 2007.
- Yearly distribution of tweets:
Most of the tweets were published in 2021.
- Tweets with more likes:
The tweet with more likes is from 2021, and it has 10000 more likes than the second tweet. It turns out that it was written by the comedian Abby Govindan.
- Tweets with more retweets:
We can see that some of the tweets with more likes also have the most retweets. There are three tweets that look almost identical and that mention drugs used to treat neuropathic pain. But if we look closely, we see that the order of the drugs is different in two of them and that the third one (and most recent) lists two more drugs.
The third tweet with more retweets looks interesting. This is the whole text:
If we follow the link, it takes us to this article published in Vice in October 2019 (by the way, this is the article that made me want to quit gabapentin).
- Tweets with more replies:
We can see a huge difference in the number of replies between the first tweet and the rest. It looks like quite a lot of people have strong feelings about gabapentin.
- Most common unigrams:
The most common unigrams are ‘gabapentin’ (60400), ‘neurontin’ (41900), ‘pregabalin’ (33400), ‘pain’ (29800), and ‘drug’ (13150). We can see that ‘lyrica’ (12000) also appears, despite not being able to use it to scrape the tweets. The last word is ‘side’ (5700).
- Most frequent combinations of two words or bigrams:
The most common combination of two words is ‘side effect’ (5000) and the second is ‘nerve pain’ (4500). Most of the rest are combinations of the names of the drugs, like ‘gabapentin pregabalin’ (3000) or ‘neurontin lyrica’ (2700). Other combinations are ‘neuropathic pain’ (2400), ‘300 mg’ (2300), ‘brain synapsis’ (2000), ‘death sentence’ (1984) or ‘lyrica death’ (1934).
- Most common combinations of three words, or trigrams
The most common trigrams are ‘new brain synapsis’ (1962), ‘gabapentin 300 mg’ (1945), ‘death sentence new’ (1934) and ‘lyrica death sentence’ (1931). If we look for ‘lyrica death sentence’ on Google, we can find articles like this one, that actually makes reference to a study published in the scientific journal ‘Cell’ in 2009.
- Most common hashtags:
We can see that the most common hashtag is ‘Neurontin’ (4420) and that almost double the frequency of the second most common hashtag, ‘pregabalin’ (2334). Some hashtags appear only in a very few tweets, compared to the whole dataset, like ‘opioids’ (313), ‘viagra’ (310), ‘spoonie’ (297) or ‘pfizer’ (182).
Classification of Tweets using the Model Trained in the Sentiment140 Dataset
For this part of the project, I mostly followed the code of this Kaggle Notebook. I introduced some changes to the data preprocessing part, like adding ‘USER’ and ‘URL’ to the list of stopwords. After those changes, the accuracy of the logistic regression model dropped to 180. I saved the vectorizer and the model as pickle files, and after loading them and using them to transform the ‘clean_content’ column and make predictions, this is the result:
It looks like the predictions are mostly correct, but I don’t consider the second tweet as negative. ‘Decrease’ has a negative connotation, but ‘decrease pain’ hasn’t.
Comparison of the three models
Let’s have a look at the first 10 tweets and their sentiment according to the three models:
We can see that the three methods have classified the second tweet as negative. Actually, six tweets have been classified as negative by the three methods. There’s no consensus with the first tweet, and the rest of the tweets are classified as positive by at least two methods. But this is a very small sample, let’s see how each model classifies all the tweets:
It looks like the three methods classify most of the tweets as negative, but the amount of negative tweets varies quite a lot, between more than 100000 (Transformers) and more than 50000 (Afinn). Let’s try to find the consensus between the three models, that’s it, how many times the three models give the same result, how many times only two models agree, and how many times none of the models agree. This is the code we will use:
We have 46333 tweets with the same sentiment for all three methods, and 58348 tweets classified with the same sentiment by two methods. Let’s extract the tweets with a consensus of three and analyze them:
Most of the 46000 tweets are classified as negative.
So, what’s the best method? I think Afinn is the fastest and easiest to use, and I like the fact that it has a neutral category. But, like the other methods, it’s not perfect. As Richard Socher says, sometimes it’s better to expend sometime labelling data manually and train your own classifier. Maybe I should have done that when I started this project.