Graphing the AST-Blog#1

Data Extraction and Supplementary Table Load In

Johanna Jones
4 min readApr 19, 2024

Welcome back to Graphing the Atlantic Slave Trade (AST) blog series, where I document my process and progress of creating a network graph of the Atlantic Slave trade. The AST is a dark period in our history with an extensive data record. My aim is to uncover this history through community detection and network analyses using a combination of tools such as python3 and Neo4j.

In this post I cover my datasources, data extraction for my supplementary tables and the slow but sure creation of the AST database within Neo4j.

Data Download

The data downloaded comes from the SlaveVoyages site and data repository. There are a number of data formats and versions available depending on your software and needs. The data used for this analysis can be found here. I have opted for the tastdb_exp_2019.csv and supplementary table/ data dictionary SPSS codebook from 2023 (November) that can be found here.

SlaveVoyages has a curated and interactive database, with cleansed fields, appropriate date formats and imputed location names that are available for download. However, I have opted to download the raw CSV file as that gives me greater control over which columns I would want to preserve for analysis. The dataset is large with 276 columns and 36,109 observations. While such rich and diverse data is useful in a written historical context, within a data analytics one though, these columns provide redundant information. In another blog post I will conduct some EDA and provide the necessary rationale for those columns kept and discarded.

There is also data available on the Intra-American slave trade for those who are interested.

Supplementary Table extraction

The supplementary tables are categorically coded with corresponding values. These tables are listed below:

  • Voyage Outcome: The first outcome of a slave voyage. This could include a ship reaching its destination as planned, a ship being abandoned, captured or even shipwrecked.
  • FATE_2: The secondary outcome of a voyage. This generally covers the status of slave embarkation and disembarkation from origin to destination. The values also covers whether slaves perished or were shipped elsewhere.
  • TONTYPE: The definition of tonnage by each trading country.
  • SHIP_NATION: The country/ nation in which the ship was registered.
  • RIG: The rig of the ship/ vessel
  • XMIMPFLAG: The voyage groupings used to estimate the imputed slaves.
  • REGION: Geographical regions that are divided into “Broad Regions”, “Specific Region Country or Colony”, “Port or Location”. Broad regions refer to continents such as the Americas, Europe and Asia. Countries refer to specific regions like Brazil, England and South Asia. Ports refer to specific trading ports such as Jamaica, Madagascar and Goa.

Tables were extracted from the pdf file through Adobe. This of course does not foster reproducible research but all files will be available at my Github repo.

Neo4j Supplementary Table Read In:

An important thing to note is that the data is in an xlsx file format which will throw errors and problems when importing with the core APOC library. Essentially, apoc.load.xls and apoc.load.csv are covered under the ‘Extended’ documentation of the APOC library. Tomaz Bratanic gives a great explanation in their article. You will need to add additional files to your Neo4j Plugins folder in order for the apoc functions to work. Follow their tutorial and instructions for the full setup if you are using Neo4j Desktop. Make sure your APOC library versions are also compatible with the corresponding files you download- I am using version 5.18.0 .

Like always, files are read in with cypher query language. I’ve provided the code necessary for the REGION table import and relationship creation. We want to set up the relationships such that a port is within a country which is within a broad region.

// Region node

CALL apoc.periodic.iterate('
CALL apoc.load.xls("file:///Supplementary_tables_clean.xlsx", "Region") yield map as row
RETURN
row.Broad_region_code as Region_code,
row.Broad_region as Region
',
'
// Create Region node
MERGE (br:Region {code: Region_code, name: Region})
',
{batchSize: 500, iterateList: true, parallel: true});

// Create the inter-relationships
CALL apoc.periodic.iterate('
CALL apoc.load.xls("file:///Supplementary_tables_clean.xlsx", "Region") yield map as row
RETURN
row.Broad_region_code as Region_code,
row.Specific_region_code as Country_ColonyCode,
row.Specific_region_country_or_colony as Country_Colony,
row.Place_code as Port_code,
row.Places_port_or_location as Port
',
'
WITH Region_code, Country_ColonyCode, Country_Colony, Port_code, Port

// match the countries and colonies to the broad region
MERGE (br:Region{code:Region_code})
MERGE (cc:CountryColony {code: Country_ColonyCode, name: Country_Colony})
MERGE (p:Port {code: Port_code, Port:Port})

// set relationships
MERGE (br)-[:COUNTRY_OF]-> (cc)
MERGE (cc) -[:PORT_IN] -> (p)
',
{batchSize: 500, iterateList: true, parallel: true});

An example of how the graph looks with its respective relationships is shown below. The broad region is the Caribbean, the country Jamaica and its ports and colonies. It will be interesting to see which ports and countries were major players in the slave trade. From written records we already know the Caribbean, Americas and Europe were major locations. Lesser locations such as the Middle East and South Asia are of particular interest to explore and deep dive into.

In the next post, I’ll cover some EDA and insights on the main dataset as well as share the data import, the schema and overall database.

Thanks for reading! As always please feel free to leave any feedback you might have :)

--

--

Johanna Jones

I am a Masters Student, studying a Masters in Data Science. I am in my final year of study looking to explore new ideas and share them.