What do the Top Trending Tweets tell us about Trump, Obama and Biden?

From 56,000+ Tweets gathered with Python, dating back to the start of 2008

Hiten Naran
Analytics Vidhya
10 min readSep 22, 2020

--

WordCloud showing most common words used by Trump, Obama and Biden’s top 250 most retweeted tweets since 2017

Introduction

With the lead up to the 2020 US Presidential elections and simply being bamboozled by the amount of stuff Donald Trump gets away with on Twitter, for instance the tweet below which happens to be his 7th most retweeted tweet of all time since 2008…

I started to get curious and wanted to find a way to do a side by side comparison of how Joe Biden, Barack Obama and Donald Trump utilise Twitter as a means of communication.

With a bit of Python code and head scratching. Voila!

I was able to gather some interesting observations from extrapolating tweet level data going back to the start of 2008. Let’s start by taking a look at some top line numbers.

How frequently did Trump, Obama and Biden tweet over the years?

  • We can see that Obama was quite an active tweeter during his second presidential term (2013–2016), particularly during the early years (2013–2014). Though since leaving office, he has become quite a passive tweeter.
  • Trump on the other hand has been a highly active tweeter with the frequency of tweets being pushed out really exploding from 2013 onwards. In a TV interview with CBS after having won the US presidency he stated his use of social media would be “very restrained, if I use it at all.” This doesn’t appear to be the case. I wonder what he’s most often posting…#China, #MakeAmericaGreatAgain #MAGA etc.
  • Joe Biden only started tweeting from 2012 onwards. Whilst we can see that he was quite an active tweeter during the course of 2012, he very rarely tweeted from 2013 through to 2018 where he spent the majority of this period serving as the Vice President. Although his frequency of usage has certainly increased from 2019 onwards, as we lead up to the 2020 Presidential elections. We’ll see further along what his most popular tweets look like.

From looking at this graph many questions pop to my mind, one of which being how many retweets and engagements are coming off the back of these tweets? Let’s take a look …

What are the average retweet per tweet numbers for Trump, Obama and Biden over the years?

Look at that, Obama is way way way ahead!

  • It’s interesting to see how Obama’s average Retweet per Tweet numbers have really exploded from 2017 onwards despite having become a far more passive tweeter since his second presidential term ended.
  • Worth noting that the follower base for each has been growing over the years, which would largely explain why we are seeing greater retweet numbers in recent years. Particularly with Trump whose follower base has exploded since taking up the presidency, gaining a whooping ~70 million extra followers from a base of ~18 million. See figures below for follower growth stats over the years (sourced from trackalytics.com).
As of 21st September 2020: Trump followers = 86.1M, Obama followers = 122.7M, Biden followers = 9.5M followers

Let’s have a look and see if there are any common trends across each of Obama, Trump and Biden’s most retweeted tweets.

Based on the observational data above I have focused on producing a custom WordCloud with Python to analyse sentiment across the top 250 most retweeted tweets for each of Obama, Trump and Biden from 2017 onwards.

What does the Custom WordCloud show us for Trump, Obama and Biden?

WordCloud showing most common words used by Trump, Obama and Biden’s top 250 most retweeted tweets since 2017

It isn’t at all surprising and a tad bit worrying to see that Trumps most retweeted tweets take a rather nationalistic tone with words such as:

‘Make’, ‘America’, ‘Great’ , ‘Iran’, ‘Law’, ‘Order’ ‘enemy’ and ‘China’

frequently appearing.

It is interesting to see how Biden’s most retweeted tweets are those where he is frequently going on the attack against Donald Trump in the build up to the 2020 Presidential elections. This election is certainly proving to be one of the most polarising of modern times.

Read further on to see how I came to produce the above visualisations and extrapolated tweet level data going way back to 2008 via Python…

How is this done?

Step 1: Import necessary packages

Step 2: Extract Tweet level data

Given the limitations of only being able to gather 3,200 tweets via the free basic Twitter API access, working with the ‘GetOldTweets3’ library is a useful hack for scraping an infinite amount of tweet level data. This method works through web scraping the Twitter user feeds, versus the traditional method of being required to access the data through a backdoor API connection. Long story short as long as the data is visible on the webpage i.e. (timestamp, text, mentions, hashtags etc)…we can gather the data.

The function build below will enable us to scrape any user’s twitter feed within a specified start and end date (inspiration for code taken from fellow Medium blogger).

As we are looking to scrape tweet level data going all the way back to 2008 + the sheer volume of tweets to gather (~56K tweets!). To ensure the function doesn’t crash midway, I created a looping script below which will store the tweet level data in a series of CSV files broken out by year. With the handy ‘glob’ method, we can concatenate the CSV files into a single Pandas DataFrame.

And Voila! The above code can be used to scrape any Twitter user’s feed within a specified date range. After a bit of data cleaning, we can now jump into the juicy explanatory analysis.

Step 3: Explanatory Analysis — Visual graphs with Seaborn

You have the option of using either Seaborn or Matplotlib to produce visual graphs. I have a preference for working with Seaborn which is what I used here given that it requires less syntax in comparison to Matplotlib.

We now have graphs plotted with Seaborn. Next step…

Step 4: Building the CustomWordCloud

Here we get to work with the WordCloud library in order to generate our very own custom WordCloud.

  1. In order to visualise a WordCloud to a customised shape. Which in our case is to the shape of Trump, Obama and Biden. We’ll need to source a black & white background image to serve as a mask (images below will suffice).
Donald Trump, Barack Obama and Joe Biden: Black and White stencils

2. We’ll need to break the dataset out into individual DataFrames for Trump, Obama and Biden. With each DataFrame ordered from the most retweeted to least retweeted.

Now we can go ahead and build out the function below…

Finally, we now have a function which will enable us to generate a Custom WordCloud from the top retweeted tweets for a given year range. In our case we want to generate a WordCloud off the back of the top 250 most retweeted tweets for each person from 2017 onwards. We can do this by inputting the following parameters into the built function below:

WordCloud showing most common words used by Trump, Obama and Biden’s top 250 most retweeted tweets since 2017

And there you have it! Our very own custom WordCloud built to work with any tweet level dataset. The data has been made available through the following DataStudio link, if you wish to have a gander: Link Here

You can also check out my GitHub repo below:

-

References:

https://medium.com/@AIY/getoldtweets3-830ebb8b2dab

https://pypi.org/project/GetOldTweets3/

https://github.com/amueller/word_cloud

https://www.trackalytics.com/

--

--

Hiten Naran
Analytics Vidhya

I’m a Digital Marketeer, Python enthusiast, avid traveller and a voracious reader (I love non-fiction). I also take a keen interest in global affairs.