A Game of Data and GraphQL
Creating a Neo4j Graph Database (and more) based on Game of Thrones (A Song of Ice and Fire) data.
As season 7 is progressing, interest around Game of Thrones data is flaring up again. There are plenty of very thorough data sources like the A Wiki of Ice and Fire and the Wikia Section of Game of Thrones. But those are unfortunately not available as plain data APIs.
Thanks to Joakim Skoog that changed at least a bit. He scraped and cleaned data from the sources above and made it available at his An API of Ice and Fire, which is a neat .Net project running on Azure. The code and data!! is also available in his GitHub repository.
Most recently, Wall Street Journal wrote about his API, which I find quite unexpected.
As we currently have our 7 weeks of Graph of Thrones challenge running, I thought it would be fun and useful to create a Neo4j graph database out of Joakims data.
You can find all the scripts and documentation in my game-of-graphs GitHub repository.
Data Source
The data about Westeros is available via several API endpoints, which are detailed in the documentation. For us the house and character data is most interesting.
You can retrieve the data directly on the API homepage or in your browser, e.g. using https://anapioficeandfire.com/api/characters/1303 for “Daenerys Targaryen”.
My initial approach used the API to retrieve the data and build the graph in Neo4j until I saw that the repository contains the original JSON files, so we can use them directly.
I want to make the data available both directly in Neo4j as well as an GraphQL endpoint. That’s why, using the API documentation, I wrote a short GraphQL schema file that contained people, houses, seats, and regions.
GraphQL Setup
Schema
Using the neo4j-graphql-cli, we can quickly spin up a sandbox instance for the data and push our schema file.
In the Neo4j UI we can display the graphql schema visually with call graphql.schema()
we can do the same in GraphQL Voyager:
Data Import
The data import works by loading the JSON files from Joakims repository with Neo4j’s Cypher and creating nodes and relationships to form our graph. Because I didn’t want to store superfluous data, I use a few cleanup operations upfront. Several of the attributes are turned into relationships, e.g. leader- and followship or seats and regions.
Here are the two queries that you can just paste individually into the hosted Neo4j Browser of your Sandbox instance.
Queries
You can query the data now via GraphQL, e.g. using the GraphiQL UI hosted by the sandbox. The nice thing here is that you get built in auto-completion and documentation.
Example GraphQL Query
For instance to find House Stark, its founder, seat, region, allegiance and the first 10 followers with their name and seat you’d run this query:
Of course you can use the API also from your own application or other tools (like graphql-cli).
In the sandbox you can find these instructions:
Your GraphQL endpoint is available at
https://<10-0-1-...-.....>.neo4jsandbox.com/graphql/
.
We use HTTP Basic Auth, so be sure to set an auth header:Authorization: Basic xYXcXCCXCXCXCXCXCXCXCXCX=
Example Cypher queries
In the Neo4j Browser you can run arbitrary graph queries, for instance to visualize family trees.
Missing Data
While looking at the data, I saw that some of it was missing, here is a query that shows which main characters have no parental relationship:
Which returns:
Walder, The waif, High Septon, Margaery Tyrell, Tywin Lannister, Unella, Aemon Targaryen, Alliser Thorne, Arya Stark, Asha Greyjoy.
You clearly see that the highlighted ones actually have parents or children we know of, so those relationships are missing in the data and we should help Joakim improve the data quality by sending updates his way.
Other Datasources
Besides all the visual artists who manually crafted infographics, family networks and maps of Westeros, here are a number of graph related articles, that discuss the data side of things.
- Network of Thrones by Andrew Beveridge, Character interactions
- William Lyon Import and Analytics of the above into Neo4j
:play https://guides.neo4j.com/got
- Wikia Data via Mark Needham Repository:
:play https://guides.neo4j.com/got_wwc
- Tomaz Bratanic Battles from Kaggle data
- Chris Willemsen, NLP Analytics on GoT Books