Twitter Sentiment Analysis on Taliban Takeover of Afghanistan

Aiden Bromaghin
CodeX
Published in
7 min readAug 25, 2021
Photo by Andre Klimke on Unsplash

President Biden ordered the complete withdrawal of American troops by September 11 in April of this year. By August 15, the Taliban had taken over the capital city, Kabul. From the moment Biden announced his intention to remove US troops this was a politically contentious issue. The controversy surrounding it only grew as the Taliban swiftly took control of the country.

Interested in the public perception of Biden’s decision and it’s aftermath, I decided to perform a quick Natural Language Processing (NLP) analysis on twitter data related to this issue. I wanted to see how emotionally charged the language being used is, so I chose Vader to run sentiment analysis on a collection of tweets. I also wanted to take a look at the frequency distribution of the most common words to see if any obvious words or patterns emerged.

To start, I created a new app on my Twitter developer account and opened a new Jupyter Notebook. I kept the keys and tokens private and created a Tweepy object. Tweepy is an open-source python package that enables users to access the Twitter API. This is what allows me to collect tweets for analysis. More information about Tweepy can be found here in its documentation.

import credentials
import pandas as pd
import os
api_key = credentials.api_key
api_secret_key = credentials.api_secret_key
bearer_token = credentials.bearer_token
access_token = credentials.access_token
access_token_secret = credentials.access_token_secret
import tweepyauth = tweepy.OAuthHandler(api_key, api_secret_key)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

Once my Tweepy object was set up, I used it to access tweets. Unfortunately Tweepy won’t let me access an unlimited number of them— I believe the limit is 3200. I used the tweepy.search() function to query tweets that contain the word ‘Afghanistan’, converted the data to a JSON object, and normalized it to dataframe. After that I narrowed down the number of columns I wanted to keep.

tweepy_object = api.search(q='afghanistan', lang='en')
json_tweets = [tweet._json for tweet in tweepy_object]
data = pd.json_normalize(json_tweets)
columns = ['text', 'retweet_count', 'favorite_count', 'favorited', 'retweeted', 'place', 'geo', 'coordinates' ]
df = data[columns].copy()

Before performing sentiment analysis, I wanted to take a peek at some of my data. Below are the five most retweeted tweets I had stored.

top_tweets = df.sort_values(by='retweet_count', ascending=False)['text'].to_list()
top_tweets[:5]
['RT @abhijitmajumder: With evacuation of the last 700 Hindus/Sikhs left in #Afghanistan in a 38-million population, an entire civilisation w…',
'RT @TheOnion: ‘Let’s Take It To Our Afghanistan Experts,’ Says Anchor Throwing To Panel Of Dick Cheneys https://t.co/auVG3df5fV https://t.c…',
'RT @AbhishBanerj: An Indian liberal trapped in Afghanistan appealed to PM Modi for help\n\nIndian govt rescued him.\n\nAs soon as he arrived in…',
'RT @BreitbartNews: Free Afghanistan Activist to Joe Biden: "I Regret My Vote for You" https://t.co/4mnUsD8Shs',
'RT @Lowkey0nline: This is essential viewing. Unaccountable CIA backed militias, accompanied by US military in Afghanistan carrying out murd…']

The tweets were a bit messy so I removed some of the clutter using regular expressions (regex). As a student, I’ve successfully managed to avoid regex statements for the last two years, but today, my luck caught up with me. I needed to remove the non-alphanumeric characters to make the texts more readable and easier to perform sentiment analysis on. If you need to brush up on regex like me, I found this link useful for my purpose.

import regex as re
tweets = df['text'].to_list()
tweets = [re.sub("[^a-zA-Z0-9]", " ", tweet) for tweet in tweets]
df['text'] = tweets

If I look at the same tweets again, I can see that they’re slightly more readable.

['RT @abhijitmajumder: With evacuation of the last 700 Hindus/Sikhs left in #Afghanistan in a 38-million population, an entire civilisation w…',
'RT @TheOnion: ‘Let’s Take It To Our Afghanistan Experts,’ Says Anchor Throwing To Panel Of Dick Cheneys https://t.co/auVG3df5fV https://t.c…',
'RT @AbhishBanerj: An Indian liberal trapped in Afghanistan appealed to PM Modi for help\n\nIndian govt rescued him.\n\nAs soon as he arrived in…',
'RT @BreitbartNews: Free Afghanistan Activist to Joe Biden: "I Regret My Vote for You" https://t.co/4mnUsD8Shs',
'RT @Lowkey0nline: This is essential viewing. Unaccountable CIA backed militias, accompanied by US military in Afghanistan carrying out murd…']

I chose to use Vader for the sentiment analysis portion, in large part because it was designed to deal with social media data. If you’re new to Vader, I think this article is a good intro. The polarity_score() function returns negative, neutral, positive, and compound values for the sentiment of the text that is passed to it. For ease of access, I got the compound value for each tweet and saved it as a new column in my dataframe.

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
df['sentiment'] = [analyzer.polarity_scores(tweet)['compound'] for tweet in df['text']]

Just for fun, let’s look at some of the most negatively scored tweets in our data.

neg_tweets = df.sort_values(by='sentiment')['text'].to_list()
neg_tweets[:5]
['RT GeorgeMonbiot The media s lust for blood helped march us into the disastrous wars in Afghanistan and Iraq It wants us to forget that ',
'RT MailOnline US drone pilot leaks footage of his kills in Afghanistan questioning expansion of the program https t co jX68BNwjgj',
'RT TheOnion Let s Take It To Our Afghanistan Experts Says Anchor Throwing To Panel Of Dick Cheneys https t co auVG3df5fV https t c ',
'RT billroggio This is a big part of the problem ISKP was never the major threat in Afghanistan First threat has always been the Taliban ',
'The latest US Civil War Daily https t co nVLpuHLEbA Thanks to LinnsStampNews BurnTheTombs galoistheory1 afghanistan civilwar']

Yikes. Blood lust, pilots leaking footage of kills — it’s easy to see why these were given a lower sentiment score.

To get a feel for the sentiments of the tweets overall, I looked at the average, lowest, and highest scores.

import numpy as npmean_sent = np.mean(df['sentiment'])
mean_sent
-0.10875000000000001print('min: ', df['sentiment'].min())
print('max: ', df['sentiment'].max())
min: -0.8555
max: 0.4995

As you can see, the average score was slightly negative. I’m honestly surprised that it wasn’t lower given the subject matter — words like ‘war’, ‘terrorism’, etc. should drive the score lower. Vader gives text a score ranging from -1 to 1, and our lowest value was definitely approaching the lower limit.

That wrapped up all I had in mind for the sentiment analysis, but I wanted to look at the frequency distribution as well. I thought it’d be interesting the see if any words kept coming up over and over throughout the data. To that end, I converted my collection of tweets into a single list and began the preprocessing. I didn’t do too much, I simply removed stop words (words that carry little meaning, like ‘a’, ‘the’, etc) and reduced the words to their root via a process called stemming.

import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
stop = stopwords.words('english')
stop.append('https')
stop.append('rt')
stop.append('co')
words = ' '.join(df['text']).lower().split()
words = [word for word in words if word not in stop and word != 'afghanistan']
from nltk.stem import PorterStemmerstemmer = PorterStemmer()
words = [stemmer.stem(word) for word in words]

After that, I found the most common words using NLTK’s FreqDist() and plotted the results.

freqd = nltk.FreqDist(words)
most_common = freqd.most_common(20)
import matplotlib.pyplot as plt
import seaborn as sns
most_common = pd.Series(dict(most_common))
fig, ax = plt.subplots(figsize=(20, 20))
most_common.plot = sns.barplot(x=most_common.index, y=most_common.values, ax=ax)
ax.set(xlabel='20 Most Common Words in Tweets Concerning Afghanistan As of August 24, 2021', ylabel='Count')
plt.savefig('Afghanistan Tweets Freq Dist.png')
plt.show()

I was surprised I didn’t find more of the same words reoccurring. After ‘US’, the next most common words only at show up at most twice. The words listed are about what you would expect: ‘evacuate’, ‘taliban’, ‘war’, etc. Interestingly, both ‘never’ and ‘forget’ are on the list. I didn’t check for n-grams — a way of looking for words that appear together — but it seems likely that they’re appearing in the tweets as ‘never forget’, a reference to 9/11. An extension of this analysis could check for n-grams to look for common pairings of words relevant to the topic.

All in all, this was a very cursory analysis. I only collected a few thousand tweets from one moment in time. The situation in Afghanistan has been developing rapidly; this data does not reflect that. The data I used is a small sample size from a limited time period from users on one social media network. The findings in this are not meant to be conclusive in any way of wider public opinion about what is going on.

That being said, it was interesting to take a look at this data. The tweets were not as negative as I had expected, although the sentiment scores were skewed towards negative values. I had expected to get a little more insight from the frequency distribution of the words as well. Instead, there didn’t seem to be any strong emerging themes. Apart from ‘US’, there weren’t any meaningful words appearing repeatedly, although the more frequent words were relevant to the topic.

I thoroughly enjoyed working on this quick little project, and hope you did too! If you made it this far, thanks for taking the time to read till the end.

--

--

Aiden Bromaghin
CodeX
Writer for

Data science graduate student with a background in consumer and mortgage lending.