Studying Middle Earth: Marquette Student Builds Topic Modeling App for Tolkien Class
Marielle Billig will be one of Marquette’s first Data Science graduates when she dons her graduation gown this May. The senior — double majoring in Computer Engineering and Data Science — will then take a position as a data scientist with BlueCross BlueShield.
And before all of that, she took Dr. Gerry Canavan’s Tolkien as an Individual Authors course in the fall of 2017. As a student in more traditionally quantitative fields, a literature course could perhaps seem a little out of one’s comfort zone. But that sensation of being a little out of her element became part of the basis of Billig’s final project that semester when she decided to ask a very interesting question:
Can you really use data science to analyze literature?
To find out, she created a tool called the Tolkien Analysis App using a combination of what she learned about in the Tolkien class as well as her background in computer engineering and data science. Created using the statistical computing and graphing language R and Shiny, which helps to create interactive data stories for the web, Billig uses topic modeling to show the frequency that a topic (a list of words that appear together frequently in a text) appears over the course of the text or collection of texts.
So, the obvious question is: what does this mean?
Billig explained to me that topic modeling is a great way to help get the sense of a large amount of data in a short amount of time. (Not counting the hours she spent writing the code, cleaning the data, and fine-tuning her app, of course.) Tolkien’s corpus is massive. Even if you just consider his fictional worlds and creations, he was a prolific writer whose works continued to be edited and published by his family long after his death. So an app like this is just one of many ways to drill into the Tolkien works, and one that, in particular, highlights the movement of trends and ideas through his work or works.
To see this in action, I took a look at the topics identified in The Silmarillion. Looking at the third topic (highlighted in grey near the bottom of the image, and graphed in thick green), it’s clear that while the Noldor, a clan of Elves, and Morgoth, Middle Earth’s first Dark Lord, is mentioned here and there throughout the text, the frequency with which these words appear really increases near the middle of the book. Of course, I pulled out a copy of The Silmarillion just to see what was going on around Chapter 13, which by the numbers along the X-axis looks to be the location of the topic’s biggest spike, and lo and behold, the chapter is named “On the Return of the Noldor” and is focused on interaction between Morgoth and these elves. Just “elves,” you can see in the reddish line of the graph, is a topic that remains fairly steady across the whole text. This makes perfect sense, of course, as The Silmarillion is Tolkien’s history of the elves in Middle Earth.
Billig’s app is a perfect example of just what digital humanities is. She took the quantitative tools that had been created for statistical analysis of large collections of data and applied them to a qualitative field of study — literature. Looking at a book or collection of books as a database, a large collection of data points, allows literary studies to make studies of literature and insights about narratives that are different than what has been done historically, but no less meaningful or full of potential.
It’s just a new way of doing an old thing. And we think Billig’s example is pretty exciting.
Billig is currently investigating hosting options for her app. This post may be updated at a later date to add the app’s location.