Using a Bible dataset
The Bible Corpus, provided by Oswin Rahadiyan Hartono on Kaggle offers comprehensive indexing for seven different translations of the Bible.
I am going to briefly go over how I use visualization during my EDA.
Libraries I import
Loading the CSVs using Pandas
Exploring the keys for the translations:
version_keys
Next up for exploration:
key_abbrev
First, I display the first 10 rows of the dataset usingkey_abbrev.head(10)
Plotting
key_abbrev
Informed by the graph above, I wanted to see the name and count for Book ID #35 using key_abbrev
Plotting different features from
key_abbrev
Figures 17–19 can get pretty messy but I included them here to show where I started. After trying out multiple scale sizes for the abbreviation names listed on the y axis, I realize that I might not even need them as the count for each book acronym remains consistent.
Notice the stair-shaped graph patterns as we move from the Old Testament books to the New Testament books.
So after trying out a few ways of plotting features ‘a’ and ‘b’, I decided to dramatically increase the figure size as shown in in Fig 20.
Clearly label the axes to explain the story behind the graph using code in Fig 22:
The figures above show how graphs could be used to visualize features of a dataset to help supplement EDA.
Stay tuned for Part 2 where I will be using different types of graphs to learn more about the rest of the datasets.