Adding semantics to graph databases with Grakn. Part2
Hello. I’m Michelangelo and I’m part of the Early Adopters Program at Grakn Labs. We are developing a software stack for structuring, exploring, and adding functionalities to graph databases. I have been using the platform for no more than a few days and yet have managed to produce something interesting. This is the second part of a series of post recounting of my experience. You can find the first part here.
In the first instalment of this series I introduced the problem and described what I am trying to do.
Just to recap: I have a list of cancer researchers with the number of papers they have written together and I want to structure that data in a Grakn graph, which is made up of two layers: a database schema and a data layer. In this post, I will begin to demonstrate how a database schema is built with the Grakn software stack. In the next one, I will complete the schema and show how data is loaded into the graph. All the code and data for this project can be found on my GitHub.
If you have not read the first post in the series, some of the details of this one might be lost on you, so I suggest you read that first. Go on, I’ll wait.
Done? Great! Welcome back!
Just to whet your appetite, here is what the final graph looks like:
Before we can put data into our model, we have to build the schema; that is, we have to set the rules and constraints that regulate the relationships among data. There is more to it, but coming back to the example I made in my first post, this is where we can establish that a person can prefer a particular dish, which needs specific ingredients to be prepared. So that’s where we avoid making omelettes with Johns.
There are a couple of ways of building a schema with the Grakn software stack: we can use the Java API, or the Graql language, with its shell. I will use the latter, since it allows me to start testing immediately without the added overhead of a full blown Java project.
For the time being, I will not get into the details of the syntax. First, because it’s not definitively specified yet and particular details are bound to change in the future; and second, because I don’t mean to make this post a tutorial — rather, I want to give you a taste of what is like working with the Grakn stack.
There are two fundamental objects that we have to insert into our model: the oncologists and the co-authorship relationships that link them. Perhaps counterintuitively, I will start with the co-author relationships. It is not strictly necessary, but I have found that it helps to specify what entities play which roles in which relationship. Going back to the friend-recipes-food examples, if we define the relationship ‘prefers’ between people and recipes, it’s easier to remember stating that people can prefer things (and recipes can be preferred).
Besides, if you think about it, it also makes sense conceptually: when you are modelling a network of data of things/entities/objects, the most important aspects are the connections, not the data itself.
Building the schema: the relationship
So let us think for a moment about what exactly a relationship is. It is, at its core, something that links other things in a not necessarily symmetric way. All the parties involved into a relationship play a particular role, which is dictated by the nature of the relationship itself.
In our case, the co-authorship relationship is a relationship that links author X and author Y. We still do not know who will actually play the role of author X and author Y, but it does not matter at this stage.
In Graql everything we insert into our graph must have a unique id, and relationships must link at least two things; specifically, it must have at least two roles, but it can have as many as we want.
So how do we insert a co-authorship relationship that links author X and author Y in Graql? Have a look.
NOTE: The code below was correct for early versions of Grakn. Since it was published, we have introduced some changes to Graql syntax as the platform has matured, and we have yet to update this blog post.
We are effectively telling Graql that we want to insert this thing called “co-authorship”, which is a type of relationship between two things that will play the role of, respectively, author_X and author_Y. It might at first sound convoluted, but it is not complicated. One technical detail is that (at least for the moment) roles have to be declared explicitly, that is, we have to tell Graql that these things that we call author_X and author_Y are roles of a relationship. The actual relationship insertion looks like this, then:
Building the schema: adding resources
We are not done yet with the co-authorship relationship, though. Recall that in our data we not only know who is a co-author of whom, but we also know how many papers our researchers have co-authored together. We could add a third role to our relationship and then adding numbers as entities in our graph, but that would be wrong in a number of ways.
Say that George and John have written 42 papers together; the number 42 is something that we attach to the relationship, but it makes no sense to say that either John or George are in any kind of relationship with the number 42. We call these things that we attach to the elements of our Grakn graph’s resources. You can think of resources as post-it notes we attach on the graph. If you prefer, you could also think of them as attributes.
If you think about it, attaching a resource to a concept—roughly, in Grakn lingo, a concept is whatever appears in the graph — means that there is a relationship between the resource and the thing it is attached to. In this relationship, there are two roles: the role of the thing to be attached (the resource value) and the role of the thing to which it is attached (the resource target). This relationship works under the hood and we do not have to insert it explicitly into the graph, but it is a good thing to know.
In this example, we then have to tell Graql two things: first, to add a resource called number_of_papers; second, that the added resource is going to be a long integer.
Once we have defined the resource number_of_papers we just need to tell Graql that it can be attached to the co-authorship relation. This is done via a has-resource statement.
The schema file we have built so far looks like follows.
After having inserted the co-authorship we just need to add the fundamental object of our schema: the oncologists (someone has to write those papers, after all!) before we can start adding data to our graph. As you will see in my next post, this will be easy.