On the Covid Campaign Trail: A Text Analysis of Donald Trump’s Speeches

Chloe Robinson
Introduction to Cultural Analytics
9 min readMay 22, 2021

By Amelia Robinson and Chloe Robinson

As president, Donald Trump presided over a period of unprecedented change and turmoil. The onset of the coronavirus pandemic in early 2020, fundamentally altered everyday life, as many died or became sick, and millions more sheltered in place and adapted to a socially distant lifestyle. While governing through this period of crisis, Trump was also preparing to compete in arguably one of the most heated and consequential presidential elections in American history. A public health crisis shocking in its severity and scale thus converged with the campaign. As a politician, Trump is known for his informal speaking style and inflammatory rhetoric. Throughout the pandemic, Trump famously used public speaking opportunities to downplay the severity of the virus and the crisis more broadly, peddle fringe ideas and oppose necessary public health measures. Given these circumstances, we decided to examine how the pandemic might have affected Trump’s rhetoric in speeches on the campaign trail. In this project, we will analyze how Donald Trump chose to respond to and interact with the COVID-19 pandemic throughout his 2020 campaign for president as seen through a selection of his campaign speeches.

We used a dataset containing text files from 35 of his campaign rally speeches in 2019 and 2020, with a little less than half of the speeches occurring before the onset of the pandemic in the United States and a bit more than half after. The dataset was posted to the data science community website Kaggle, where users can upload data that they have created for use by other users. The data was collected by Kaggle user Christian Lillelund, who is affiliated with Aarhus University in Denmark. The data includes the full text of each of the 35 speeches, edited for clarity. Each speech is its own plain text (.txt) file, and the files were zipped together for downloading purposes.

There are a few elements that are missing from the data. For one, there is little context associated with the texts, including other speakers at the rallies or any crowd interaction that may have occurred throughout the speeches. We also cannot see what, if any, gestures Trump is making, or his tone of voice, which affect how the words are interpreted. The text files also do not fully capture Trump’s speech pattern, because filler words like “um” and “uh” were edited out by the transcriber. Relatedly, we relied on the transcriber to faithfully report what Trump said. As a result, it is possible that the person doing this work made mistakes or misrepresented reality in some way. If they had, we would not necessarily know about it. One ethical consideration with this dataset is that it is possible that someone other than Lillelund did the work of collecting and transcribing the speeches. If that was the case, that individual would not have received any credit in this project. Generally, there are not too many ethical considerations with this dataset, because the speeches were made in addresses by a public figure. They were knowingly televised or otherwise recorded and made in the presence of journalists and the general public, for mass consumption.

For this assignment, we decided to employ two computational methods, Topic Modelling and TF-IDF, in order to achieve our goal of examining Trump’s treatment of the coronavirus in his campaign rhetoric. These two methods fall under the category of text analysis tools and are ideal for examining plain text files such as ours. They allow us to examine nuances in topic and word choice, affording us insight into how Trump spoke at his rallies, and what about. Topic Modelling is a method that can be used to “identify the main topics of discourses within a collection of texts” (Walsh, 2021). We used Topic Modelling to figure out which ideas Trump emphasized most across the selected campaign speeches, shedding light on common themes and points of emphasis, and how they changed, or stayed the same, in relation to the progression of the pandemic and campaign cycle. Our second computational method used was TF-IDF, which stands for term frequency-inverse document frequency. TF-IDF is “a method that tries to identify the most distinctively frequent or significant words in a document” (Walsh, 2021) by assigning scores to specific words. We used TF-IDF to ascertain the significance of specific words, related to the coronavirus, across Trump’s speeches, again paying attention to continuity and change. This analysis sheds light on Trump’s overall treatment of the crisis, including the level of emphasis he placed on it at different times throughout the pandemic.

For our analysis, we split the thirty-five plain text files into two subcategories, by date. The first category contained text files of all of the speeches made “Pre-Covid,” and the second contained the files from rallies that occurred “Post-Covid,” meaning during our current pandemic period. We made this distinction by figuring out when, among the files in our dataset, Trump first mentioned the coronavirus. Any speech occurring after the date of the speech with the first mention was sorted into the “Post-Covid” category. Using this methodology, fourteen speeches were classified as “Pre-Covid,” and twenty-one as “Post-Covid.” Splitting up the files into these two categories allowed us to easily compare the Topic Modelling and TF-IDF findings from before the pandemic, and after it was underway.

Our main finding for the Topic Modelling analysis was that Trump’s top speech topics from before the onset of the pandemic were not very different from his top speech topics after. We instructed our Topic Modelling software to produce ten topics for both Pre-Covid and Post-Covid. Seven out of the ten topics appeared to be the same or very similar for Pre-Covid and Post-Covid. Some common topics that appeared both before and after the onset of the pandemic, as we have labelled them, include “Themes of the Election,” “Trump Speaking About Himself,” “Trump Speaking About Conflicts and His Enemies,” and “American Greatness.” One new topic that appeared Post-Covid we titled “Current Events,” which included “virus.” That topic also included “police,” likely referencing popular opposition to police brutality at the time of mass racial justice protests during Summer 2020. Another new topic that appeared Post-Covid was entitled “Immigration,” and included buzzwords such as “aliens,” “ICE,” “wall,” and “illegal.” His decision to focus so much on border security while the pandemic was raging might suggest that he was trying to pull the focus away from his administration’s mismanagement and activate his base instead.

Words featured in all of the topics demonstrated Trump’s infamous vernacular, including words like “sleepy,” probably for “Sleepy Joe,” his demeaning nickname for now-President Joe Biden, and “best,” “great,” and “love,” likely referencing his Make America Great Again ideology. Trump’s lack of a highly-evident shift in topics discussed or tone used, given the drastic changes wrought by the virus, suggests that Trump did not talk about COVID-19 as much as the scale of the crisis would suggest he might. This could be seen as a conscious strategy to downplay and ignore the virus for political gain. It also seems that when Trump did speak about the virus, it was in the same speech patterns and vocabulary as is typical for him, with abundant hyperbole and a lack of seriousness. This would indicate that his words and tone remained mostly the same, even if the subject of his rhetoric had changed. These findings speak to how Trump politicized the virus by adopting its features into his overarching personal narrative.

For the TF-IDF analysis, we zoomed in on our dataset to examine how six, specific coronavirus-related words may have changed in significance scores throughout the campaign. The words that we chose to examine were “coronavirus,” “corona,” “virus,” “covid,” “mask,” and “vaccine.” We isolated those words in the dataframe of TF-IDF scores, and then examined the column containing each word on its own. It is interesting that the word “coronavirus” only registered a TF-IDF score above zero in four out of the twenty-one speeches in the Post-Covid section, indicating that Trump did not use this word often, even though it was a key issue when he was speaking. Surprisingly, among our selected words, “virus” registered a TF-IDF score of above zero in fifteen speeches, the highest number of any of our words. This was true even though “coronavirus,” “corona,” and “covid” are all more specific and technical. “Virus” seems to be a more evasive, vague term, possibly utilized by Trump in order to downplay COVID’s severity and associate it with the flu or common cold. Another point of interest is that “virus” registered twelve out of its fifteen TF-IDF scores above zero in speeches made after Trump resumed in-person events, starting with the Tulsa rally on June 20, 2020. This suggests that earlier on in the pandemic, Trump was still developing his preferred terminology for speaking about COVID. This is further demonstrated by the fact that at the Tulsa rally, “covid” had a higher TF-IDF score than “virus,” and the second-highest of any of the scores we collected, around 0.7. It seems that, over time, as he returned to his normal rally schedule, his language shifted.

The graph below depicts the TF-IDF scores for the word “virus,” represented on the y-axis, relative to the speeches. The speeches are organized in chronological order from left to right across the x-axis. Here, we can see how the word’s usage changed over time, with more prevalence after lockdown restrictions began to ease.

“Vaccine” was the second most often scored word out of our six selected words, with most of its TF-IDF scores above zero registering closer to the election and further into the pandemic. “Vaccine’s” highest score, 0.054, occurred in Fayetteville, on September 19th, 2020. The last text in our dataset is from September 22nd, 2020. The high score in Fayetteville, and the high prevalence of scores above zero in September generally, suggests that a higher TF-IDF score for “vaccine” was more likely as the election and the expected release of vaccine efficacy data from drugmakers drew nearer. Additionally, the more frequent and generally higher TF-IDF scores for the word “vaccine” versus the word “mask” also show how Trump was more focused on promising an easy fix for the pandemic rather than discussing less popular safety measures, as he often falsely claimed that vaccines would be ready by election day. An interesting outlier, however, is the 0.086 score for “mask” in Latrobe, PA on September 3rd, the highest TF-IDF score of any we recorded. This could potentially point to Trump going on one of his signature off-script rants against masking regulations, measures which he and his base politicized throughout the pandemic.

The graph below depicts the TF-IDF scores for the word “vaccine,” represented on the y-axis, relative to the speeches. The speeches are organized in chronological order from left to right across the x-axis. Here, we can see how the word’s usage changed over time, with values spiking as the election approaches.

Our work represents only a small fraction of all the analysis that could be conducted with this dataset, and on related texts and topics. Future research could compare the work we did to analyze Trump’s handling of the virus in campaign speeches to President Biden’s during the same time period. The two men have very different styles of speaking and emphasize different topics and narratives, so the contrast in how they spoke about COVID would likely be stark and fruitful for analysis. Another point of future study could be to examine another U.S. president’s rhetoric before and after a significant historical event, such as George W. Bush regarding 9/11. The applications of computational analysis to the discipline of political science as a whole are manifold and exciting to consider, and we look forward to continuing to engage with these topics in the future.

