Using ChatGPT Advanced Data Analysis for Congress Slaveowners Dataset Analysis

Oleksandr Krakovetskyi
The DevRain Tech Blog
5 min readAug 31, 2023

ChatGPT Advanced Data Analysis (previously — ChatGPT Code Interpreter) enables both technical and non-technical teams to analyze information in seconds, whether it’s for financial researchers crunching market data, marketers analyzing survey results, or data scientists debugging an ETL script. In this article, I will show how to start working with this tool.

Preparing The ChatGPT Environment

First, you need to be a ChatGPT Plus or Enterprise subscriber.

Second, you need to activate Advanced Data Analysis in your settings:

Third, you need to select GPT-4 Advanced data analysis to start working with a dataset.

Congress Slaveowners Dataset

The Washington Post has compiled a database of slaveholding members of Congress by examining thousands of pages of census records and historical documents.

About the data

A database of every person elected to Congress who was born before 1840 was created based on Washington Post research and data from these two datasets:

Folders of a repo

  • data/congress_slaveowners.csv

Data dictionary

Variables that can be found in the data:

  • bioguide: Member's unique identification number within the Biographical Directory of the United States Congress.
  • name: Name of the member of Congress.
  • positions: Member's position/s held in Congress.
  • date_of_birth: Date of birth of this member.
  • states_served: States where this member served at any point.
  • congresses_served: Sessions of Congress that this member was a part of at any point.
  • start_date: First day in Congress.
  • end_date: Last day in Congress.
  • is_slaveholder: Describes if this member of Congress was identified as a slaveowner.
  • true: This member was a slaveowner.
  • false: This member was not identified as a slaveowner.
  • unknown: There is not enough evidence to determine if this member was a slaveowner or not.

Data Analysis with ChaTGPT

Make a plot by number of slaveholders by congresses_served.

The plot shows the number of slaveholders by the sessions of Congress they served in. Each bar represents a specific Congress session, and the height of the bar indicates the number of slaveholders who served in that session.

Extract insights.

Insights from the Dataset

  1. Missing Data: The column 'date_of_birth' has 87 missing values. All other columns are fully populated.
  2. Unique Members: There are 5,558 unique members in the dataset, each identified by a unique 'bioguide'.
  3. Positions: The dataset includes members who have held various positions, with the majority being Representatives ('rep').
  4. States Served: Members have served from 81 different states or territories.
  5. Congresses Served: Members have served in 1,308 unique sessions of Congress.
  6. Date Ranges: The dataset spans various start and end dates, with the most common start date being March 4, 1873, and the most common end date being March 3, 1875.

Slaveholding Status:

  • About 56.9% of the members are identified as not being slaveholders (false).
  • About 33.7% are identified as being slaveholders (true).
  • For about 9.4%, it is unknown whether they were slaveholders or not.

Common Names: The most common name in the dataset is 'James Wilson', which appears 5 times, suggesting that there are some common names among Congress members.

Date of Birth: The most common year of birth is 1831, appearing 51 times in the dataset.

Additional Observations:

  1. The dataset spans a broad range of Congress sessions and states, offering a comprehensive view of U.S. legislative history in the context of slaveholding.
  2. The proportion of slaveholders (33.7%) is substantial, highlighting the historical role that slaveholders played in American politics.

Make a visualization about slaveholder distributions across decades.

The bar chart illustrates the distribution of slaveholders, non-slaveholders, and members with unknown slaveholding status across various decades. Each bar cluster represents a decade, and the height of the bars within each cluster indicates the number of members in each category (Slaveholders, Non-Slaveholders, and Unknown).

Key Insights:

  • Early Years: The number of slaveholders was relatively high in the early decades of the United States, especially from the 1800s to the 1850s.
  • Decline: There is a noticeable decline in the number of slaveholders from the 1860s onwards, which aligns with the end of the American Civil War and the abolition of slavery.
  • Non-Slaveholders: The number of non-slaveholders starts to increase significantly from the mid-19th century onwards, becoming the dominant group in Congress.
  • Unknown Status: The number of members with an "unknown" slaveholding status is generally low across all decades but shows some increase in more recent times. This could be due to limitations in historical records.

Make a plot of slaveholders by US state.

The bar chart displays the number of slaveholders by U.S. state. The states are listed on the y-axis, and the number of slaveholders is represented on the x-axis.

Key Insights:

  1. Southern States: As expected, Southern states like Virginia, South Carolina, and North Carolina have a high number of slaveholders, reflecting the historical prevalence of slavery in these areas.
  2. Northern States: Northern states generally have fewer slaveholders, which is consistent with the historical context of slavery being less prevalent in the North.
  3. Variation: There is a wide variation in the number of slaveholders among states, ranging from states with high counts to those with very few or none.

Show top 10 states. Use labels - full state names.

The bar chart shows the top 10 U.S. states by the number of slaveholders. The full state names are displayed on the y-axis, and the number of slaveholders is represented on the x-axis.

Key Insights:

  1. Virginia: Virginia leads the list with the highest number of slaveholders, significantly outnumbering other states.
  2. Southern States: The list is dominated by Southern states such as South Carolina, North Carolina, and Georgia, reflecting the historical prevalence of slavery in these areas.
  3. Noticeable Gaps: There are noticeable gaps in the number of slaveholders between some states, suggesting varying degrees of reliance on slave labor.

Conclusion

As you can see, ChatGPT Advanced Data Analysis is a great tool for data analysts and can change the way how we work with data.

--

--

Oleksandr Krakovetskyi
The DevRain Tech Blog

CEO DevRain, CTO DonorUA, Microsoft Regional Director, Microsoft Artificial Intelligence Most Valuable Professional, Ph.D. in Computer Science.