Text Analysis on William Ruto’s tweets with R
As Kenya nears the 2022 election and a referendum in between the political climate has been rough a couple of these months.
The deputy president of Kenya William Ruto seems to pull more and more people on his bid for presidency come 2022. He resonates with the indigents as he was also born in an impoverished surrounding and through hard work he penetrated to be the second in command.
One of the distinguishing characters is how much of a great orator he is, which has enabled him to attract large following of masses. Today he is actually the most followed person on Twitter in Kenya with 3,7129,58. I tried to explore his tweets to gain insights into his public relations on Twitter and to answer the following:
- What are his most retweeted and liked posts?
- What are the most frequent words used in his posts?
- What mostly does he refer to when the most frequent word is used?
- What are the sentiments and emotions of his posts?
- What is the polarity (positive/negative) degree of his posts?
Through Twitter API using the rtweet package, I was only able to extract 3200 tweets as per twitter’s guidelines. The user-generated data dates from 24–11–2018 to 13–02–2021. I also removed the retweets as they do not add up as user-generated text. The data has 90 variables and 3200 observations.
Favorites and retweets are among the engagement insights that can be derived after a post is shared on the platform to gauge how viral it has gone. I, therefore, looked into his most-liked and retweeted tweet.


The most liked tweet apparently was an act of sympathy when Adan Duale had been stripped of his duties as the majority leader in parliament. Though the leader was diligent and instrumental in his job, his firing shocked many but was attributed to his fierce support for the deputy president.


As you can see there is a high correlation between the most liked and retweeted tweet, that’s how the algorithms work. The most retweeted tweet was when corruption cases around the corona pandemic had spiked on his absence despite him being called corrupt and he decided to ironically tackle his nemesis. It is also worth noting that his engagement was at its top mostly in 2020 this shows a positive regression when it comes to amassing a following.
Text Analysis
Created a corpus then cleaned it by a custom function clean.corpus and created a Term document matrix a mathematical matrix that describes the frequency of terms that occur in a collection of documents.

It is quantitative to say that development,empowerment,education and the economy are some of the main things he is instrumental about. The ‘church’ and ‘God’ are evident as his affiliation of which he has been espousal to.

I looked at word association when the word ‘county’ is used as its the most frequently used. Whenever he is talking about a county 0.4 probably he is referring to Nairobi and 0.37 his Karen office. The probabilities are statistically low but highly likely when it comes to that particular word.
Sentiments scoring
Sentiment Analysis aims to detect positive, neutral, or negative feelings from the text, whereas emotion Analysis aims to detect and recognize types of feelings through the expression of texts, such as anger, disgust, fear, happiness, sadness, and surprise. I tried to access what mood does his tweets possess and among the 3200 tweets, most of them are in a joyous mood.

For the purpose of detecting the mood of the user, both conventional machine learning algorithms and deep learning techniques can be employed and the classification performances of each model are compared.

It is fair to say that the deputy president 59% of what he tweets are positive messages, 33% neutral where the number of positive and negative words cancel each other. Negative tweets sum up to a mere 8% with the ratio of positive to negative tweets been 7:1 meaning for every negative tweet ,he tweets 7 postive tweets.
What can be done
With Support-Vector Machine learning (SVM) I can build a model that detects what mood the deputy president is in when he tweets and the polarity.
Tweet classifier: If the I had tweets on different people, build a model train and test with the tweets to classify which tweet belongs to whom and what not.
Can Constraints in psychology be derived for tweets?
Curiosity satisfied!
On the next Blog I will access the political temperatures of Kenya Using the twitter data of the fore-front leaders in the upcoming 2022 presidential race.
Connect Linkedln @antonymaina
Follow twitter @antonymaina
Instagram @antony.k.maina