Unravelling Royal Mortality: How British Monarchs Died

Alexandra Fiona
3 min readJul 4, 2024

--

How I used Python, R and AI to analyze British monarchs’ deaths. Could one ask for more?

Photo by Mark Stuckey on Unsplash

I like data, and I like history. More than that, I like macabre data history. So, inspired by Suzie Edge, I recently embarked on a fascinating project to analyze the deaths of British monarchs throughout history. Using a combination of web scraping, data processing, and ChatGPT prompting, I created a comprehensive dataset that offers unique insights into royal mortality. Let me take you through my journey. First, let me describe what was done — the full code is provided below and in my GitHub.

The Data Hunt: Web Scraping Wikipedia

My first step was to gather data. I turned to Wikipedia’s “List of monarchs of the British Isles by cause of death” page. Using Python’s BeautifulSoup library, the tables containing information about each monarch’s name, reign, and cause of death were scraped. This gave me a solid foundation of raw data to work with.

Cleaning the Royal Mess: Data Processing

Raw data is rarely perfect, and this case was no exception. I used pandas to organize the scraped information into structured data. I then performed initial cleaning and categorization, ensuring that each piece of information was in its right place.

Decoding Royal Dates: Year Extraction

Many of the death dates were in various formats. To standardize this, I wrote a function using regular expressions to extract the year of death from each date string. This allowed me to place each monarch’s death in a clear historical context.

Enhancing the Dataset: AI-Powered Enrichment

Here’s where things got really interesting. I decided to use OpenAI’s GPT-4 API to enrich my dataset with additional information that wasn’t explicitly available in the original data: gender, causes of death and monarchs’ houses.

Gender Classification: First, I used GPT-4 to determine the gender of each monarch based on their name. This added a new dimension to the data, allowing for potential analysis of gender-based mortality trends among royals.

Refining Causes of Death: For cases where the cause of death was listed as “Unknown” or “Other,” I again turned to GPT-4. By analyzing the available notes, the AI attempted to classify these deaths into more specific categories like “Natural Causes,” “Killed,” “Murdered,” etc. This helped to fill in some of the gaps in our historical knowledge.

Dynasty Assignment: Lastly, I used GPT-4 to assign each monarch to a specific dynasty based on their reign dates. This added historical context to the data, allowing for potential analysis of mortality trends across different royal houses.

The Final Product: A Rich Dataset for Analysis

The final step was to export these data to a CSV file, ready for further analysis or visualization.

Time to visualise! I was inspired by the book of Nathan Yau “Visualise This” and recreated one of his plots in R for my data. The results can be seen below:

Notable Observations:

  1. Natural Causes: This category dominates across all eras, particularly in recent centuries.
  2. Violent Deaths: The early periods show a higher concentration of monarchs who were killed, murdered, or assassinated, especially during Saxon and Norman times.
  3. Executions: Clustered mainly in the Stuart period, reflecting the tumultuous times of the English Civil War.
  4. Rarer Categories: Accidental deaths and euthanasia are sparse, with only a few instances across the timeline.
  5. Unknown Causes: Scattered throughout, but more common in earlier periods, likely due to limited historical records.

Code:

Python notebook to extract data from Wikipedia and perform data preparation with ChatGPT:

R script for plotting (adapted from the book “Visualise This” by Nathan Yau.)

Sources:

--

--

Alexandra Fiona

We are all respectable researchers until p=0.051. A Serious Data Scientist and a less Serious Data Writer. linkedin.com/in/shymanskaya-data-science/