Analyzing References in Bible’s Verses Using Complex Networks With Pandas and Gephi
The Bible is the world’s best-selling and most widely distributed book. Its name comes from the Koine Greek “Ta Biblia”, which means “the books”. It’s a collection of 66 books written by around 40 authors over approximately 1,500 years.
Originally there was no division of chapters and verses. But in order to facilitate studying and referencing specific parts of the text, these divisions were developed. In around 1227, Stephen Langton, an Archbishop of Canterbury, divided the Bible in chapters. In 1553, Robert Estienne published the first Bible with his verse system, which is the one used until today.
Cross references can be very useful when studying the Bible, they help the reader to find related verses, be it for being a direct quote, a paraphrase or just for sharing the same theme. These references can be expressed as a Complex Network, a graph with non-trivial topological features.
In this article, we’ll show how you can use Pandas and Gephi to visualize and analyse cross references between Bible’s verses.
We opted for this source because it splits reference intervals in two columns (sv and ev) and represents verses with an ID of BOOK + CHAPTER + VERSE (01001001 represents Genesis 1:1, for example). This structure of ID is helpful when working with intervals of verses.
Thereby, we converted the database table to a CSV file and did some pre-processing, breaking reference intervals in isolated rows. See the example below:
After that, we discarded the column “ev” and renamed the remaining columns to “verse”, “rank” and “ref” respectively. You can get the resulting dataset on our repository.
That way, we have three columns: the verse that references, a rank that tells how reliable that reference is and the verse that is referenced.
After pre-processing our data, we ended up with a dataset that has 612.591 rows. When we opened it on Gephi, our graphs were too polluted, so we had the idea to filter the rows by rank. Thus, we chose to only keep the rows that had a rank of 10 or higher, which gave us a dataset with 5.527 rows. That way our data was smaller and more reliable.
We could have selected only one section of the Bible, like the New Testament, that’s very easy because of the IDs of the verses. After we filtered our data, we “translated” the IDs to more understandable references (01001001 to Genesis 1:1, for example).
With that done, we exported the processed dataset to a CSV and imported it on Gephi. In the video below, I show how to generate the Complex Network visualization:
After doing what is shown in the video, we got the following result:
You can export that data create to generate the visualization (weighted in-degree, modularity, etc.) to a CSV or many other formats, clicking on
File > Export > Graph file.... This dataset can be imported on Pandas again for further analysis, you can see an example in this project’s IPython Notebook.
The result shown in Gephi is a Complex Network. The size of the node is proportional to the number of times it is referenced. Also, in this graph, we only plotted instances in which the degree of the node was equal or higher than 3.
We can see, for instance, that “Matthew 6:33” is one of the most referenced verses. In the New International Version (NIV), it says:
But seek first his kingdom and his righteousness, and all these things will be given to you as well.
We can infer that the verses in which Matthew 6:33 is referenced may talk about “God’s kingdom” and “Seeking God before everything else”. There are tons of verses that are related to those phrases. Another example is Isaiah 41:10, that says (NIV):
So do not fear, for I am with you; do not be dismayed, for I am your God. I will strengthen you and help you; I will uphold you with my righteous right hand.
This verse has key-phrases as “I am your God” and “Do not fear”, which can be found in a lot of verses inside the Bible.
ScrollMapper’s Bible Databases