Using Python to Construct a Network Graph of Historical Figure

Jeremy Langenderfer

Published in

Web Mining [IS688, Spring 2021]

7 min readApr 2, 2021

Using Python to Construct a Network Graph of Historical Figure

By: Jeremy Langenderfer

If you happened to read my previous article, then you would know that I absolutely love to travel all over the world. Unfortunately, that hobby is on pause for the time being due to the pandemic. I’m sure that some people who love to travel simply do it for the sights, or to go somewhere tropical and soak up the sun. Don’t get me wrong, I love to do that too. However, probably one of the most important reasons that I love to travel is to learn more about history in whatever destination I decide to visit. Not just history in the United States either. History in general is important to me because I think it’s important to learn from the past. And as much as I love to learn about history, I was surprised when I began researching for a network graph project using Python and stumbled onto a project idea. What is the idea? I found that those who study humanities actually found Python to be useful in constructing network graphs when studying the relationships between “important nodes”, which in this case are represented as historical figures.

You’re probably wondering who might these historical figures be? Well, they’re known as the “Religious Society of Friends”, or you might be more familiar with them as being called “Quakers”. The, “Quaker Movement, was founded in England in the 17th century by George Fox. He and other early Quakers, or Friends, were persecuted for their beliefs, which included the idea that the presence of God exists in every person. Quakers rejected elaborate religious ceremonies, didn’t have official clergy and believed in spiritual equality for men and women. Quaker missionaries first arrived in America in the mid-1650s. Quakers, who practice pacifism, played a key role in both the abolitionist and women’s rights movements” (History.com Editors, 2017). Probably one of the most familiar “Quakers” that you might be familiar with would be William Penn, who was responsible for founding the State of Pennsylvania.

As I still continue to learn more about Python, I am more and more impressed as to what you can do with Python. In this particular article, I will explore how to construct a network graph, explain what nodes and edges are, the Python libraries needed in collecting and analyzing the data and the tools that I used in the construction of the network graph. Now, that you have a general understanding as to what this article will entail, let’s talk about Python and how to construct a network graph.

Before getting into the technical details, I think it will be important to discuss some of the terms that I’ve already mentioned and explain what they mean. First, let’s define what a network graph is. Simply put, a network graph is a mathematical structure that illustrates how subjects are connected to each other. There are several different types of network graphs, but we won’t get into those for now. The network graph itself is constructed of edges and nodes. A node is also referred to as a “vertices” and the edges are the “links” between nodes. For this particular example, a node will represent a “Quaker” and the edges will represent the links between each node. Now, that we have a brief understanding of the terms, we can move onto collecting the data and constructing the network graph.

The first step will be obtaining the data that will be used in the construction of a network graph. The data (nodes and edges) for this particular project was obtained by downloading the csv files from “https://www.history.com/topics/immigration/history-of-quakerism”. After downloading the csv files, I used PyCharm in building this network graph. The next step will be importing libraries that will be used for this project. The below image displays the libraries that were used in this project.

After importing the necessary Python libraries for this project, the next step will be to read the csv files that were downloaded and collect any data that we will want to analyze for this project. The below image will show how to read the csv files, which are labeled as “quakers_nodelist.csv” and “quakers_edgelist.csv”. You will also notice that if you would want to verify that the csv files were read correctly, you can use print statements similar to below, which should output the number of nodes and edges.

If done correctly, the output will show as below.

Now that we know the data is being read correctly, we can move on to creating a graph object. For this, we will be using the NetworkX library. In order to create the graph object, we will be using a command from the NetworkX library. After this empty graph object is created, we can now add the list of nodes and edges to this graph object, as shown in the below image.

Using the print statement “print(nx.info(G))” will output general information about the graph, which will be depicted in the image below.

Next, I decided to add attributes to learn more about the nodes and edges. In this particular example, I used the column names from the “quaker_nodeslist.csv” file, which were “historical significance, gender, birthdate, deathdate and ID”. I then created empty dictionaries for each respective column name I previously listed. Next, I created a for loop for each dictionary that I just created in order to add the attributes to each respective dictionary. The code for this step is displayed below.

You can use a print statement to retrieve information about the attributes that were just added as shown in the example below.

Running this print statement will yield the following results as an example.

One other analysis that I performed was conducted using the NetworkX density function. When using this function, the value that is returned will indicate whether, or not the network that is being analyzed is a perfectly connected network. The returned values will be between 0 and 1. A returned value closer to 0 will indicate a network that is not that closely connected, whereas a returned value closer to 1 will indicate a perfectly connected network. The below image will display the Python code for performing this analysis. The returned value was 0.02478279447372169, indicating this particular network is not perfectly connected.

The last analysis that I wanted to perform was a visual analysis using the Gephi tool. I am completely new to this tool, so I am still learning. The below image will display the Python code used to generate a file that can be read into the Gephi tool for visual analysis.

After loading the above file into the Gephi tool, I generated the below network graph visualization between Quakers.

As you can tell from the image, it is difficult to read any of the names in the more tightly clustered areas. From looking at the network graph, the names of the Quakers represent the “nodes” and the lines represent the “edges” between “nodes”.

In closing, this was a very interesting project where I learned more about the features and capabilities within Python. My plan is to continue analyzing the data with the Gephi tool as I would like to clean up the network graph and make it more readable. As far as any bugs encountered along the way, this was one of them. It’s not necessarily a bug per say, but with it being a new tool that I am unfamiliar with, I see the benefits in Gephi and plan to learn more about the features with Gephi.

References

History.com Editors. (2017, May 19). Quakers. Retrieved April 02, 2021, from https://www.history.com/topics/immigration/history-of-quakerism

Ladd, J., Otis, J., Warren, C., & Weingart, S. (2017, August 23). Exploring and analyzing network data with python. Retrieved April 02, 2021, from https://programminghistorian.org/en/lessons/exploring-and-analyzing-network-data-with-python

Written by Jeremy Langenderfer