Exploring the writing styles of Hamilton, Madison, and Jay in the Federalist Papers using latent semantic analysis.
You might’ve heard of that play, Hamilton, that’s been sweeping the Broadway circuit and reviving national interest in the “Ten dollar Founding Father without a father”. It’s massively entertaining with a compelling narrative, and it has that authenticity of history (with some creative liberties). Federalism has certainly gotten a sexy makeover for the new millennium.
In one of the songs, Non-Stop, Aaron Burr’s character lets rip this memorable line regarding the writing of the Federalist Papers:
“John Jay got sick after writing five. Madison wrote twenty-nine. Hamilton wrote the other fifty-one.”
It’s an exhilarating line, and it inspired me to take a look at the Federalist Papers to see what kind of legacy these political giants left behind for Americans as this country exits its 240th anniversary. These were the papers that tipped the political balance in favor of ratifying the Constitution, that built the bedrock of the democracy that we cherish today. I wanted to answer the question of whether modern text analysis could shed any high-level insights on these impactful pieces.
NLP with Python
Note: this section gets technical fast, so feel free to skip to the results if NLP isn’t your thing.
First things first — getting the text of the 85 Federalist Papers downloaded. Luckily, Project Gutenberg is a free online repository of e-books that keeps the Federalist Papers (and a vast array of other books!) available to the public. I downloaded the 1.2 MB txt file and loaded it into a Jupyter Notebook.
Immediately, I could see that the text had to be cleaned. Project Gutenberg licenses and HTML formatting cluttered the text and every paper had to be parsed out from the single file. I cleaned up the formatting and built a text scraper to isolate each paper by author and load it into a Pandas dataframe with the author, title, and corpus. So far so good!
Next, I had to figure out how to quantitatively model the words for visualization. I ended up using sci-kit learn’s Tf-idf vectorization (term frequency-inverse document frequency), one of the standard techniques in natural language processing. Boiled down to essential terms, this technique tracks how frequently a word appears in a single document and penalizes the score if it also appears frequently in all other documents. It’s a measure of both importance and uniqueness for a word.
Doing this for every word, you can create a quantitative vector for each Federalist Paper. We end up with 558,669 unique word n-grams after filtering out common English words like ‘of’ and ‘they’. We’re prioritizing phrases with Tf-idf scores above a certain threshold in order to find possible keywords in the papers.
We’re still in a bind though. This gives us an (85 x 558669) vector — impossible to graph in our current reality. What we’re trying to do is something called latent semantic analysis (LSA) that attempts to define relationships between documents by modeling latent patterns in text content.
The information age is kind and there are luckily open source implementations for LSA algorithms such as Singular Value Decomposition (SVD). This algorithm reduces the dimensionality of the text vectors while enabling us to preserve the patterns in the data. Perfect for our visualization needs in reducing the data down to two dimensions!
Results: Strangely Foreshadowing
Neat graph. Now what does it mean?
Each dot represents one of the Federalist Papers, color coded by author. The two axes represent the transformed data — they don’t mean anything by themselves, but they’re valuable as comparison points against each other. You can see that Hamilton and Madison’s papers tend to occupy different spaces on the graph — this indicates that they’re prioritizing different language in their pieces. This may be a byproduct of writing about different topics throughout the papers. Despite this, the topics each man chose to write about can still be revealing in terms of ideology.
Given the schism between James Madison and Alexander Hamilton later in the 1790’s, this difference in vocabulary within the Federalist Papers gains new meaning. John Jay, extensively involved in foreign affairs and the first U.S. chief justice, wrote mainly on the dangers of foreign influences and the necessity of federalism to guard against other nations.
This is great from a high-level, but what about the words themselves? We can sort the top 10 Tf-idf scores for each Federalist Paper to see what phrases emerge as the most distinctive. Below I’ve included examples of the outputs for two of the papers. Given the topics of Federalist Paper 10 (guarding against political factions) and Federalist Paper 11 (The beneficial impact of federalism on economic trade), the key phrases seem to be quite relevant.
Paper 10 | madison
number citizens 0.05002710059542178
small republic 0.04879259622492929
passion interest 0.04168925049618481
Paper 11 | hamilton
Doing a count of the top 10 most common words for each author shows that:
Most common words for James Madison:[('government', 12),
Even within the Federalist Papers, James Madison demonstrates a bias towards topics like relationships between the state and federal government, the role of representative parties, and the will of the people.
Most common words for Alexander Hamilton:[('government', 8),
Alexander Hamilton shows his strong Federalist stance through language about the branches of the federal government and a preference for terms that emphasize the union of states. While there is no indication of the ideological split between Hamilton and Madison within the Federalist Papers, it is interesting to consider that the vocabulary choices between the two certainly reflect different priorities.
Most common words for John Jay:[('government', 3),
('navigation fleet let', 1),
('national government', 1),
('efficiency government', 1),
('militia obeyed', 1)]
For John Jay, his works are more limited given that he authored only five of the papers. They all discussed the influence of foreign interests on America and how a strong union was needed to stand up to other countries. The text analysis reflects these topics well — discussing militias, fleets, and efficiency.
Many thanks to the developers for NLTK, sci-kit learn, numpy, and pandas. Also credit to Thomas Hughes for his tutorial on tf-ifd visualization for text analysis (https://github.com/tmhughes81/dap).