Is Trump really the most populist US president? — Learning Data Science #4
An analysis of all US presidents’ speech patterns since 1900

In this article, you will learn what populism is, what defines populist rhetoric and what we can do to measure it. You will also see how the rhetoric of US presidents has changed since 1900.
While reading the newspaper one cannot help but see an article about Donald Trump. Again. Alongside his usual political antics, the articles often comment on his Twitter feed or his “unique” use of language. A word that gets often thrown around in these contexts is that he is a “demagogue”. His rhetoric is called “divisive” or “populist”. (e.g. Olurunnipa & Parker 2019)
Whether these terms are justified is for others to decide. But how can we approach this problem from a political data science perspective? How can we measure if Trump is really a populist? And is he an outlier or just part of a long trend?
In this short article, I want to present you with a short data science project that I’ve tackled in my free time. This article neither claims to be scientific nor complete. It aims to show a creative way of applying data science to political questions as well as display my current skill level in data analysis.
I will begin by defining populist and populist rhetoric. Based on these two definitions and the scientific literature, we will settle on criteria by which we will examine Trump’s rhetoric. I will use the speech transcripts of all US presidents since 1900 and analyze them to see and compare whether Trump is part of a long trend or an outlier.
What is populism?

The scientific literature is not clear as to what exactly populism is. What is clear, though, is that most definitions include some form of pitting “the people” against a ruling elite. (e.g. Canovan 2004, Mudde & Kaltwasser 2017) Therefore, our definition of populism will be:
Populism is a political ideology that puts “the people” versus “the elite”.
Usually, “the people” are portrayed as morally superior to a corrupt and self-serving “elite”. The distinction between the two groups can be drawn along racial/ethnic, national or class lines. (Decker 2004)
What is populist rhetoric?

Populist rhetoric, by extension, is a style of communication that perpetuates the “us vs. them” — mentality of populism.
Some common characteristics of populist rhetoric include:
- Use of unusual (often undiplomatic) language. (Canovan 2004)
- Portraying themselves as outsiders.
- Use of anti-establishment rhetoric. (Mudde & Kaltwasser 2017)
- Targeting of others, displaying bad manners, opposing political correctness and being skeptical of experts.
- Use of simple language and a tendency to oversimplify problems. (Moffit 2016)
What these characteristics all have in common is that populist leaders use them to distinguish themselves from the elite or ruling class. They do this to be seen as trustworthy and to win votes.
How do we measure populism?

Political scientific research is at a point where automated text analysis is used more and more often. For example, it can be used to determine the influence of interest groups on EU policy. (Klüver 2009)
While this research is old, by today’s standards, the same problems remain. Even more sophisticated algorithms don’t eliminate the need for cross-validation via hand-coding. For those that don’t know, hand-coding is a scientific research tool whereby researchers manually label certain text passages according to specific criteria. In this case, it would mean I would have to verify any results from the algorithms by hand in the speech transcripts.
Since this research project is something I do in my spare time and I don’t claim to be scientific in my approach, I won’t cross-verify my results with hand-coding. This, however, limits the range of things I can measure without sacrificing too much in validity. I also have to keep in mind my current skill set. I can do some basic data cleaning, some basic data visualization as well as perform some intermediate Python operations.
Therefore, I will measure the following aspects in the rhetoric of US presidents:
- How simple is their language?
I measure this by examining the vocabulary size within 100.000 words to guarantee comparability. I furthermore measure this by calculating their respective Fleisch-Kincaid-Score. It measures the readability of a text passage and assigns to it a (school) grade level.
- How much anti-establishment rhetoric do they use?
I measure this by measuring the amount of in-group out-group construction they use. In-group out-group construction is language that creates for the listener a feeling of being part of a group that stands against another group. To determine this, we create a dictionary for commonly used words that generate an in-group out-group feeling and measure their relative speech frequency in the speech transcripts.
What is our data basis?
The data basis for our little inquiry is the Grammar Lab data set containing transcripts of speeches from all US presidents with around 3.5 million words.
All US presidents starting from Theodore Roosevelt will be examined.
Except for the calculation of the Fleisch-Kincaid-Score, the transcripts have been pre-processed to allow for proper analysis. The texts have been converted to lower case, lemmatized (words are reduced to their linguistic stem) and all punctuation and numbers have been removed. Note that stop words have not been removed since they are needed for calculating the size of the vocabulary.
What are the results?
In this section, I’ll briefly lay out the results and explain my approach to measuring them with Python.
How complex is the language of Donald Trump compared to other US presidents?
To calculate the Fleisch-Kincaid-Score, I used a handy little Python package called textstat which allows me to perform these operations with a single line of code.
Here are the results:

As you can see, Trump has the lowest Fleisch-Kincaid-Score of all US presidents since 1900. He speaks at a level of 6.7, so roughly a 7th grader whereas Obama spoke at a level of 9.7 or roughly at a 10th-grade level. George H. Bush spoke at a 7.6 or 8th-grade level. But you can see that he is not an extreme outlier. He is part of a long downward trend where US presidents tend to use simpler language and less complex sentence structures. In the early 19th century, political debates were still largely confined to a well-educated ruling elite while today most people engage in these discussions.
How big is Donald Trump’s vocabulary compared to other US presidents?
To calculate this, I simply performed a word count with the Counter module from the collections package. It counts all the occurrences of each word and gives it all out in a neat dictionary. To get the size of their vocabulary, was to count the length of the Counter dictionary. To do this, I used the first 100.000 words of the speech transcripts to ensure we could compare the absolute number of different words.
Here are the results:

This may come as a surprise. Trump’s vocabulary is almost as big as that of Barack Obama. He clocks in at around 5930 words whereas Obama has 6352 words. Eisenhower had the smallest vocabulary from all presidents since 1900 with 3002 words. This is not surprising if you consider that Eisenhower was a wartime general and was used to keeping his speeches as simple as possible.
The media often portrays Trump as a bumbling idiot that cannot form coherent sentences. However, the data paints a different picture.
At first glance, it may seem illogical that Obama, the Ivy-league educated president, has a smaller vocabulary size than Donald Trump. But don’t forget that we measure speech transcripts. Most of them were written by speech writers which have a huge vocabulary. Trump often just has to read those speeches. This may be a reason why his occasional incoherent rambling isn’t reflected in a small vocabulary in the data.
How much anti-establishment rhetoric does Donald Trump use compared to other US presidents?
To measure this, I created a dictionary for words that create an in-group out-group feeling. These words are: ‘we’, ‘us’, ‘them’, ‘they’, ‘elite’, ‘corrupt’, ‘i’, ‘me’, ‘their’ and ‘our’. By including the listeners or attendees in a group (“we”, “us”, “our”) and by talking about the elite (“elite”, “corrupt”, “them”, “they”), he creates a feeling of two opposing sides. I know that this is unscientific, but it should not be too far off from the truth since these words are used in some studies as an indicator of anti-establishment rhetoric. (Mudde & Kaltwasser 2017)
To guard against selection bias for these words, I performed this search numerous with a variety of word combinations and the results were always similar. This gives us an indication that the pattern we discovered is a stable one and not dependent on a specific choice of words.
Since the length of the transcripts varies from president to president, I’ve measured the relative frequency of in-group out-group words used.
Here are the results:

As you can see, there is a clear trend line pointing to increased use of these words. Trump is clearly not an outlier, but rather following a long trend of using more populist / anti-establishment rhetoric. Why this could be the case is an interesting question that needs further research.
What are the limits of this research design?
The limits of this research design are two-fold:
Firstly, it does not uphold scientific rigor. Although I tried to be as accurate and exact as I could be, using in-group out-group words as an indicator for anti-establishment rhetoric is not especially exact. I don’t know if there is a scientifically proven link between the use of in-group out-group language and populism. That was my hunch and I suspect it’s not too far off from the truth but I don’t know for sure.
Secondly, as always, we are limited by the data we have. While I had a look at the speech transcripts and verified that they were good, I didn’t read every single word. Also, the choice of speeches that were transcribed could have influenced the result. Perhaps, the creator of the data set chose to exclude certain speeches which lead to a distortion in the results.
What can we learn from these results?
First of all, these results are neither conclusive nor scientific. However, they paint a rough image which helps us to put Trump into context.
Yes, he uses the simplest style of language of any US president so far. Yes, he uses lots of anti-establishment rhetoric but he follows a clear trend line. Also, his vocabulary is by far not the smallest of the US presidents.
While he is an outlier in some aspects, he is a “normal” president in others. What became clear is that the picture is more nuanced than popular media coverage seems to imply (What a surprise!).
As a whole, presidential rhetoric seems to become more populist since the beginning of the 19th century.
This can happen for numerous reasons. One can be that politics is coming closer to the average education level of the population. Therefore, it lets more people take part and understand every day political discussions.
Another could be, that this merely reflects a trend that we have seen for a long time in US politics: polarization. Maybe US presidents feel the need to use populist language more and more to appeal to their voters. Therefore, the shift in presidential rhetoric only reflects a change that is happening in society.
In the future, using sentiment analysis to see which president uses the “threat” motif to get more votes or analyze the frequency of slurs could give us more indications.
In any case, I hope you found that this little article enlightening and I’ll see you again in two weeks with another update from my side.
Until then,
Phil
[Sources]
Canovan, Margaret (2004): Populism for political theorists?, Journal of Political Ideologies, 9(3), p. 241–252
Decker, Frank (2004): Der neue Rechtspopulismus, 2. Edition. Leske + Budrich, Opladen 2004, p. 33.)
Klüver, Heike (2009): Measuring interest group influence using quantitative text analysis, Journal of European Politics Volume 10 (4): 535–549
Moffitt, Benjamin (2016): “The Performative Turn in the Comparative Study of Populism”. American Political Science Association Comparative Politics Newsletter, vol. 26, issue 2, pp. 52–57
Mudde, Cas & Kaltwasser, Rovira (2017): Populism — A Very Short Introduction, Oxford Press
Olurunnipa, Toluse & Parker, Ashley (2019): https://www.washingtonpost.com/gdpr-consent/?next_url=https%3a%2f%2fwww.washingtonpost.com%2fpolitics%2ftrump-campaign-sees-political-advantage-in-a-divisive-appeal-to-working-class-white-voters%2f2019%2f07%2f26%2f39234f00-aef1-11e9-8e77-03b30bc29f64_story.html, last checked 15/02/2020 at 11:44







