#USAID on twitter : a quick NLP analysis

I used to work in International Development and did a quick NLP analysis of the #USAID handle

attribution: USAID Bangladesh

Natural Language Analysis (NLP) is a field of data science which uses machine learning to analyze and predict pattern in human (and increasingly machine) language usage. I used to work for one of the largest USAID implementers and thus understanding the Twitter stream of USAID piqued my interest.

The data set consists of 6186 tweets from 8th June 2017–9th June 2018 and was collected on 9th June 2018.

Popularity — Favourites and Retweets

On average tweets were reposted or forwarded 6 times and the average time a tweet was tagged as favourite was 11. For the stats geeks among us, there is an 87% probability that a tweet tagged favourite would be retweeted.

left: most popular in the last 12 months , right: most popular 2018 to date

As expected the most retweeted item was also tagged the most favourite. The Twittersphere LOVED USAID’s mission to empower women.

left: most hearts in the past 12 months and most loved in 2018 to date

Distribution of tweeting volume

December 2017 was the tweeting nadir by percentage (maybe the social media team took a well-earned rest?). Twitter activity picked up in the new year and peaked in March 2018, possibly due to the Global Partnership Week.

Hashtags and Mentions as a proxy for USAID attention

2017: Looking at the commonalities in hashtags and mentions, and using them as a proxy for the USAID’s attention, ICT4D related items seemed to be in the forefront and India seemed to have been a particular focus in the second half of 2017.

2018: In the first half of 2018, the focus on ICT4D seems to be shared with agriculture and health.

What made USAID happy or sad? Sentiment Analysis

I used the Textblob library for Python to conduct the sentiment analysis. Textblob uses the Stanford University’s Natural Language Training Kit (NLTK) to classify the words as either positive, neutral or negative. Without going into too much detail, it uses a Naive Bayes algorithm trained on movie reviews to categorize words according to the classification from a rotten tomato type score. Then the scores of all words in a sentence is summed to produce the sentence score between -1 and 1. -1= very sad , 0 = neutral, 1 = very happy.

USAID tweets were mostly neutral, there were more happy thoughts than not. The percentages are:

Happiest:

Can Twitter predict the trajectory of USAID funding?

If this hypothesis is correct, mining Twitter for hashtags and mentions might be useful for organizations in order to react quickly to future procurement needs of USAID.

A time-lag analysis of RFP/IDIQ* releases and trends in social media is the best approach to establish a correlation and then make accurate predictions.

Next Steps:

I hope to deploy an unsupervised learning model, either k-means clustering or a Linear Discriminant Analysis model in the near future to observe the hashtags cluster formation.

Maybe use flask/Django to create a web app to search #USAID tweets by hashtag.

*RFP = Request for Proposals (RFP solicits proposals by an agency to procure commodities and services, IDIQ — Indetinite Delivery Indefinite Quantity ( This is a type of contract that provides for an indefinite quantity of supplies or services during a fixed period of time.)

Harsha Goonewardana

Written by

I am interested in the intersection of data science and international development. Better development outcomes through analysis.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade