A Wordcloud for Trump’s Tweets

Kyaw Saw Htoon
4 min readNov 12, 2020

--

Creating a wordcloud with Donald Trump’s recent tweets to explore his most used words

Source

The POTUS has been tweeting a lot lately. I get it. Social media, especially Twitter, is his bread and butter. This is how he sends his messages directly to his supporters without the hassles of formal channels.

But I am curious. What words is he using the most in his tweets? How is he choosing these particular words to send his powerful messages?

Let’s find out!

Importing Python Libraries

First, we need to import the necessary python libraries to generate the wordcloud. These include the basic packages such as numpy, pandas, and matplotlib as well as wordcloud specific tools.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
from PIL import Image
from matplotlib.colors import LinearSegmentedColormap

Data Source

The data that I am using for this tutorial is the raw dataset of trump tweets that I found on Kaggle (https://www.kaggle.com/gpreda/trump-tweets). This dataset contains tweets from July 2020 to October 2020. This timeframe allows us to capture the tweets mainly related to the 2020 election and his campaign efforts.

This data is originally in a CSV file and we will convert it into a data frame by using Pandas.

df = pd.read_csv('/content/trump_tweets.csv')

Let’s take a look at what we have.

df.head()

It looks like our interest is the ‘text’ column which contains the tweets. You can drop the other columns but I will just keep them in the data frame since they won’t impact my wordcloud.

Data Scrubbing

Next, we will check if the columns have any null values.

df.isnull().sum()

So, the ‘hashtags’ column does have null values. It is okay since we will not be using this column. The important thing is we do not have null values in the ‘text’ column.

Now, we will join all the tweets together so that we can feed them into the WordCloud function.

tweets = " ".join(line for line in df['text'])

Importing the Mask

Our next step is to import the mask for the wordcloud. This mask will help us to create the wordcloud’s shape. As you can see below, we are using the mask that has the shape of Donald Trump’s head.

We will use the open function of the Image package to open the mask and then change it into an array of pixel values by using numpy.

mask = np.array(Image.open("/content/mask.png"))

Creating the Wordcloud

Finally, we can generate the wordcloud. In this wordcloud, we will use red and blue as the primary colors, and white as the background in order to capture the great USA spirit.

First, we will create a list with red and blue colors.

colors = ["#BF0A30", "#002868"]

Then, we will use these colors and LinearSegmentedColormap package from matplotlib to create a color map.

cmap = LinearSegmentedColormap.from_list("mycmap", colors)

Next, we will use the joined tweets, the mask, the color map, and the stopwords as input for the WordCloud function to create our image. I am setting the collocations parameter as true in order to capture the co-occurrence of words.

wc = WordCloud(width = 600, height = 400, random_state=1, background_color='white', colormap=cmap , mask=mask, collocations=True, stopwords = stop_words).generate(tweets)

Let’s see how our final product looks like. We will use matplotlib to generate the wordcloud.

plt.figure(figsize=[15,15])
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()

Voilà!

Wordcloud Analysis

Without surprises, Donald’s most tweeted words contain “Biden”, “MAGA” and “Fake News”. It is also interesting that “Pennsylvania” is the most tweeted among the battleground states.

Conclusion

Wordclouds are useful for data exploration and analysis in NLP projects. They are also great for presenting your project results to the audience in a creative way. I hope this tutorial will teach you how to create one and hopefully add values to your next projects.

--

--