Family Matters

Genealogy Knowledge Graphs Made Easy with GRAKN.AI

Michelangelo Bucci
Dec 12, 2016 · 7 min read

Note: This article has not been updated since it was published in December 2016, when it ran against GRAKN.AI 0.8.0. We know that it needs some update to accommodate changes to the syntax of Graql: it is currently out of date, although the concepts still apply. Please check out the GRAKN.AI documentation for up-to-date examples.

Highly interconnected data with heterogeneous types are a precious source of knowledge, but challenging to maintain, query, analyse and understand. GRAKN.AI is an open-source knowledge graph data platform that uses the power of machine reasoning to help you overcome these challenges to build intelligent and cognitive systems.

I have a lot of interests. I really like learning and doing new things, and I do not make a secret of it. One of the interesting things about working at Grakn Labs is that, because the software has a very general scope, I keep getting the chance to look at completely different domains of application.

While I was browsing around doing some research on how to build a family-tree-related ontology, I stumbled upon this document from Lenzen Research. It is a piece of genealogy research into the family of a Catherine Niesz Titus. The document shows three generations of Catherine’s maternal lineage, along with the documents that were used to gather the data. On its own, it is a very interesting read, but in this case it has also served as a source of inspiration to build a new dataset to play with using Grakn.

The Data

I have put everything you need to work through this example in the GraknLabs sample-datasets repo. Further information about what is included, and how to work with it, is in the project’s README.md file.

The data that we will load into a graph is based on the Lenzen Research pdf document, with some modifications, which I made for the sake of getting interesting results from the queries that I describe later.

Just to be clear: the dataset is not historically accurate, nor does it pretend to be.

The data you will use has been built into Graql format, which is the declarative language that Grakn understands. The raw data can also be found in the project repository, and comprises a set of CSV files. The step that migrates the raw data into Graql is a straightforward one, but not documented here because you don’t need that information to work through this example. However, we have it documented as an example of CSV migration on the Grakn documentation portal.

Make sure that you have downloaded the latest version of GRAKN.AI (I used version 0.8.0), started the Grakn engine, then loaded the ontology, inference rules and the data. If you’re unsure about any of these steps, please take a look at our documentation or read some of my previous blog posts.

Let’s get started!

Meet the Family

First of all, if you have not already done so, start Grakn (with the command grakn.sh start) and load the visualiser...

…wait a second…you didn’t know Grakn has a visualiser? Well, it has not received much attention in previous posts, but it’s there. You can load it simply by pointing your browser to localhost:4567. If everything is correct, you will find yourself in front of our GUI, looking at the graph loaded into the default keyspace.

Click the Types button in the top right, then the Visualise button in the newly opened drawer and, lo and behold, the ontology will materialise itself in front of your eyes.

The beauty of knowledge

Feel free to play around with it for a while, and use the Help tab if you find yourself at a loss.

The Family Tour

In the rest of this post, I will walk through a small set of queries that will highlight some interesting features of Grakn.

Clear the graph with SHIFT+BACKSPACE and let’s see what documents are in the graph by clicking the Types button, selecting Entities and document in the drawer. This is the exact GUI equivalent of running the following query in the Graql shell:

match $x isa document;

The query should, incidentally, have appeared in the top textbox.

In the graph, the nodes correspond to the documents inserted into the graph. There are only five, but they are more than enough for demonstration purposes.

Click-and-hold one of the document nodes and a popup will appear where you can select which resources are available to be shown on the nodes. I suggest notes and document-type for the documents.

Click on one of the document nodes, for example the “Dated Picture of Catherine Niesz” and a drawer will appear on the right hand side with the details of the node. Note that it is a clickable URL that will point your browser to the website where the picture resides (it is the same site where I found the genealogical research on Catherine Titus’ family).

If you double click the document “Dated Picture of Catherine Niesz”, you will see that it is linked to a birth event and a death event. This is because the document is a picture with date of birth and death of Catherine Niesz, so it is a piece of evidence supporting both events.

Double click the death node first and then the person node that appears and you will see that both events are linked to the same person node via a event-event-protagonist relation, which has a resource clarifying what is the role of the person in the event. If you double click the person node that just appeared, it will connect to the birth event as well.

Let’s look at something more interesting. Without clearing the graph, run the query

match (child: $x, parent: $y) isa parentship;

and…

Sad Tuba cue

…absolutely nothing happens. This is because the only data in the graph at the moment is related to events, documents and people. Even the people have no resource apart from their identifiers (which correspond to their complete names for easy reference, although that definitely does not need to be the case).

The Grakn Reasoner

To get to the meaty stuff, we have to fire up the reasoner.

On the left hand side, you will see the Config tab. Click on it and you will find that the very first line contains an Activate inference checkbox. Check it and you are good to go.

Ready to work

From now on you are able to unlock the full potential of Grakn on this genealogy knowledge graph. Here are a few queries you might want to try in the visualiser.

Let us start with the full family tree. Clean the graph and run this query:

match (parent: $x, child: $y);

Without cleaning you can easily add marriages to the graph:

match (husband: $x, wife: $y);

And you are now seeing the whole genealogical tree that is stored into the graph.

Very different results now

Want to know who is married to a cousin? Clean the graph and submit the following query:

match ($x, $y) isa cousins;
(husband: $x, wife: $y) isa marriage;

Who has the same name as a grandparent? You know the drill: clean and query:

match $g (grandparent: $x, grandchild: $y);
$x has firstname $n; $y has firstname $n;

The Grakn Analytics Engine

Let us finish this short tour with a query that uses our analytics engine.

First of all, let’s look for the IDs of Susan Josephine Dudley and Barbara Hercherlroth.

match $x isa person has firstname "Susan" has surname "Dudley";
$y isa person has firstname "Barbara" has surname "Herchelroth";

You will have to copy and paste their ids in order to run the query (the ids are randomly assigned when you load the data and will look like “2137419”).

Once you have obtained the id, you can check whether (at least from the information we have) Barbara and Susan are blood relatives:

compute path from “1769552” to “876552” in person, parentship;

The query above tells us to go find, if existing, the shortest path from the first node to the second one only using nodes and relationships specified in the last line (in this case person and parentship). You will see that there is no such path. Add marriages, and you will find out that Barbara is the Great-grandmother of Susan’s husband.

compute path from “1769552” to “876552” 
in person, parentship, marriage;
Barbara is Susan’s husband great-grandmother

Summary

This genealogy graph is a nice way of exploring Grakn, its visualiser and reasoner. You can use it to test your Graql skills or just to play around with the stack without having to load the Moogi movie dataset (which is several orders of magnitude larger).

If you have questions or want to give us your feedback, feel free to join our Community and use our discussion boards and Slack channel. Or just leave a comment below. We look forward to hearing from you!

Stay tuned,

M.

PS: If you liked this post, why not recommend it by clicking the small heart below, or tweet it? It makes us feel cozy and loved :)

Vaticle

Creators of TypeDB and TypeQL

Vaticle

Empowering engineers to solve complex problems — creators of TypeDB and TypeQL.

Michelangelo Bucci

Written by

Discrete mathematician/Theoretical computer scientist, learner, curious about stuff.

Vaticle

Empowering engineers to solve complex problems — creators of TypeDB and TypeQL.