I used to work in International Development and did a quick NLP analysis of the #USAID handle
Natural Language Analysis (NLP) is a field of data science which uses machine learning to analyze and predict pattern in human (and increasingly machine) language usage. I used to work for one of the largest USAID implementers and thus understanding the Twitter stream of USAID piqued my interest.
The data set consists of 6186 tweets from 8th June 2017–9th June 2018 and was collected on 9th June 2018.
Popularity — Favourites and Retweets
On average tweets were reposted or forwarded 6 times and the average time a tweet was tagged as favourite was 11. For the stats geeks among us, there is an 87% probability that a tweet tagged favourite would be retweeted.
As expected the most retweeted item was also tagged the most favourite. The Twittersphere LOVED USAID’s mission to empower women.
Distribution of tweeting volume
December 2017 was the tweeting nadir by percentage (maybe the social media team took a well-earned rest?). Twitter activity picked up in the new year and peaked in March 2018, possibly due to the Global Partnership Week.
Hashtags and Mentions as a proxy for USAID attention
2017: Looking at the commonalities in hashtags and mentions, and using them as a proxy for the USAID’s attention, ICT4D related items seemed to be in the forefront and India seemed to have been a particular focus in the second half of 2017.
2018: In the first half of 2018, the focus on ICT4D seems to be shared with agriculture and health.
What made USAID happy or sad? Sentiment Analysis
I used the Textblob library for Python to conduct the sentiment analysis. Textblob uses the Stanford University’s Natural Language Training Kit (NLTK) to classify the words as either positive, neutral or negative. Without going into too much detail, it uses a Naive Bayes algorithm trained on movie reviews to categorize words according to the classification from a rotten tomato type score. Then the scores of all words in a sentence is summed to produce the sentence score between -1 and 1. -1= very sad , 0 = neutral, 1 = very happy.
USAID tweets were mostly neutral, there were more happy thoughts than not. The percentages are:
Can Twitter predict the trajectory of USAID funding?
If this hypothesis is correct, mining Twitter for hashtags and mentions might be useful for organizations in order to react quickly to future procurement needs of USAID.
A time-lag analysis of RFP/IDIQ* releases and trends in social media is the best approach to establish a correlation and then make accurate predictions.
I hope to deploy an unsupervised learning model, either k-means clustering or a Linear Discriminant Analysis model in the near future to observe the hashtags cluster formation.
Maybe use flask/Django to create a web app to search #USAID tweets by hashtag.
*RFP = Request for Proposals (RFP solicits proposals by an agency to procure commodities and services, IDIQ — Indetinite Delivery Indefinite Quantity ( This is a type of contract that provides for an indefinite quantity of supplies or services during a fixed period of time.)