What Does Twitter Think of Donald Trump? Text Mining on Scraped Tweets involving Donald Trump

Jacob Torres
7 min readJun 7, 2018

--

Coming from the Philippines, I’ve recently immigrated here in California earlier this year. I’m still fitting in with the culture and the entire Western living, and so TV helps with the whole transition process, since I get to watch news, weather updates, TV shows and commercials. There are a lot of things to watch, but the one that’s piqued my interest lately is the president, Donald Trump.

Donald Trump is the current president of the United States, and he has been for about a year now. Although he has not reached the second half of his term, he has certainly been a hot topic, especially on social media. One such media is Twitter. My opinion is that Twitter is the social media outlet that talks about him the most; in fact Trump himself has an account, where he sometimes reveals his unfiltered views, and which might help with his popularity there.

With a lot of controversy swirling around the president especially with news on gun control, the North Korean summit deal and the Stormy Daniels lawsuit, I decided on text mining tweets involving Donald Trump. The goal of this endeavor is to know how his name is perceived in the Twitter-verse. I want to know specifically: How is Donald Trump talked about on Twitter? What are the things that people have been tweeting about him? Are they particularly positive tweets or more negative? What are current issues and opinions about him? What are phrases that correlate to “Donald Trump” in general?

-

Disclaimers and Limitations

Using the R library rtweet, I scraped tweets when you query “Donald Trump” at around 4:15 p.m., June 6, 2018 (Pacific Time). Due to the limitations of the library, I’ve only managed to scrape around 9000 tweets. I also gave myself the liberty of removing the words “donald”, “trump”, and “president” to the results since they are obviously the top words and the words that will correlate the most.

Most of the work is heavily based on Chapters 1,2 and 4 of this link, and I feel it’s a heavy sin not to give credits to it.

Common Words and Phrases

A straightforward approach to analyzing “Donald Trump” tweets is to count the most frequent words appearing per tweet. With that, Figure 1 shows the Top 20 words of such tweets.

Figure 1: Most Frequent Words in “Donald Trump” Tweets

The top words don’t seem that much informative, and so let’s try getting 2-worded phrases. It’s basically the same procedure, but instead of storing each word, we store the next two consecutive words we see. More technically this is called an n-gram with n = 2, or a bigram.

Figure 2: Most Frequent Bigrams in “Donald Trump” Tweets

Here, the line between two words is more emphasized when the bigram is more frequent. We can see here that the bigrams “testy call”, “white house”, and “invokes war” are pretty much talked about.

Correlated Words

The bigram graph in Figure 2 seems informative already, but a problem with this approach is that it only focuses on bigrams that are frequently talked about. We want to also catch bigrams that really correlate with one another even if they are mentioned less. By correlated we mean that we want that the two words comprising the bigram to be present in a group of tweets, and absent in another group of tweets. This helps us guarantee that this certain bigram really is relevant to Donald Trump.

The library widyr has a function called the pairwise_cor() which eases this process. This function computes the phi coefficient, which is the number used to determine correlation. A bigram with phi coefficient closer to 1 means that the two words are more correlated; a coefficient closer to -1 means that they are more uncorrelated. We are more interested in correlated words, so we graph the most correlated bigrams in Figure 3.

Figure 3: Most Correlated Bigrams in “Donald Trump” Tweets

There are more words that are clumped together such as:

  • “pamela anderson julian assange”
  • “chairman pawn 1995 soviet era”
  • “honor protect permanent land vamissionact”
  • “guns looms sticks 7”

This is more descriptive and informative than Figure 2. If you individually google every bag of words, every first search result will reveal a TMZ piece on Pamela Anderson, a Twitter post about a Russian intel, a video showing the president signing the VA Mission Act, and an article about the G-7 summit. All of these have Donald Trump as the main character.

“testy call” is also seen in Figure 3. If you search this tweet today (as of this writing), you will find an article involving Trump’s interaction with Canada’s Prime Minister Justin Trudeau. If you read this article the words in the phrases “White house”, “1812 war” are also mentioned; unsurprisingly, these bigrams are also correlated, as shown in Figure 3.

Positive and Negative Tweets

Now, we try to do a sentiment analysis on the tweets. This is how it works: the library tidytext has a function called get_sentiments(“bing”), which returns a sentiment lexicon made by Bing Liu and collaborators. It basically marks every word it possibly can as “positive” or “negative”, depending on the judgement call by the authors. Then this get_sentiments(“bing”) lexicon is merged to every word in every tweet, so we know how much ‘positivity’ or ‘negativity’ is associated per tweet.

Figure 4 shows the top positive and top negative words found.

Figure 4: Top Positive and Negative Words in “Donald Trump” Tweets

Top 3 negative words are “testy”, “fake”, and “burn”, while the top 3 positive words are “protect”, “pardon”, and “free”. Recall that one word might not necessarily reflect the full context of a tweet, and perhaps a bigram would be a better indicator. And so, we check the most frequent bigrams have these 6 words in them to see the full story.

Figure 5a: Top Bigrams Associated with the Top 3 Positive Words
Figure 5b: Top Bigrams Associated with the Top 3 Negative Words

The bigrams associated with “testy”, “fake” and “burn” seem to focus mostly on Trump’s testy call with Trudeau, but from the value of the bigrams it appears that these bigrams still generally depict negative sentiments.

Meanwhile, the bigrams associated with “protect”, “pardon” and “free” seem to display some positively-sounding bigrams, such as “free donald”, “free speech”, “free julian”, “woman free” and “trump’s pardon”. These bigrams are most likely referring to the Pamela Anderson / Julian Assange story or to this Kim Kardashian / Alice Marie Johnson article. However, the bigrams “free eagles” and “eagles free” might be referring to the piece on the president’s feud with the football team Philadelplia Eagles. Feuds are usually considered a negative interaction, so we can infer from this instance that not all the positive-sounding words might entail a positive context in general.

Running Total of Positive and Negative Words

Lastly, we plot a running total of positive and negative words in all tweets. The running total means that per tweet, we tally the positive and negative words we find, until we reach the last tweet found. Figure 6 shows the plot.

Figure 6: Running Total of Positive and Negative Words

In totality, we can infer that most of the tweets sound negative, rather than positive. In fact, the negative words are greater than the positive words by almost twice as much.

Conclusions

To answer the questions we asked at the beginning of the post, here’s what Donald Trump tweets are about:

  1. Most of the tweets are news involving him and a variety of personalities his name is associated with, such as:
  • Him pardoning the life sentence of an inmate, with the help of Kim Kardashian
  • Pamela Anderson asking for his pardon on Julian Assange
  • Footage of him signing an act that helps out veterans
  • His stance on gun control which will be discussed in the G-7 summit
  • Him inaccurately recalling a historical event while talking to Justin Trudeau on tariffs
  • An article explaining how a Russian intel entered his campaign

2. Majority of the tweets are negative sentiments; the negative words trump the positive words by twice as much. This might be due to various controversies and negatively leaning news about his administration.

The code I used in this text mining goal can be found here. Thanks for reading!

--

--