Adding semantics to graph databases with Grakn. Part 4

Yo te query

Michelangelo Bucci
Vaticle
6 min readSep 23, 2016

--

Hello. I’m Michelangelo and I am part of the Grakn Labs team. In this series of posts (Part 1, Part 2, and Part 3), I have described my journey as a newcomer to the Grakn software stack, recounting my experience with it as I learned. In the first three posts I have talked about how to structure and load data into a Grakn graph; in this post, which is the last one of the series, I will briefly show some simple queries we can use to explore the graph we have created.

This post assumes some knowledge covered in the preceding posts of the series, so if you haven’t done so already, please give them a read. I promise I will not run away.

NOTE: The code below was correct for early versions of Grakn. Since it was published, we have introduced some changes to Graql syntax as the platform has matured, and we have yet to update this blog post.

What’s in a query

In Graql (Grakn’s native query language), if we want to explore the data we have inserted in our graph, we need a match query. I will give you a brief introduction to the query syntax, but, if you want a more detailed description, I suggest you pay a visit to Graql documentation page, which can be found here.

A match query is composed of three parts: a match statement, a select statement and a few delimiters (or modifiers, if you so prefer). Only the first part of a match query is actually needed: the select and delimiter parts are optional.

Each part of our query can be subdivided into patterns, which are single sets of instructions passed on to Graql. Each pattern must end with a semicolon, so that Graql is able to tell different patterns apart. As you could expect from a query language, you must be able to use variables. If you are completely new to the concept, variables are basically buckets that Graql can fill with the results of the query you have launched onto the graph. In Graql, variable names are prepended by a dollar sign.

Time to query.

Shall we?

Mix and Match

Let’s start with something very simple: imagine we want to find all the oncologists in our graph. This is how our query looks:

Although the query is quite natural for anybody with some kind of programming background, let’s examine it in detail.

We are telling Graql to look into the graph–match–for everything of the type oncologist –isa oncologist– and store the results in a variable–$x–.

If you compare it to the query to add an oncologist to the graph

you can see that the insert query tells Graql to put into the graph whatever comes after the insert keyword and assign it to the type “oncologist”, the match query, on the other hand, tells Graql to fetch whatever has type set to “oncologist” and to store it into the variable that comes after the match keyword.

Making a Selection

If you actually try the above match query, you will see that the returned results are probably not exactly what we are looking for.

This happens because we have asked Graql to find all the instances with type oncologist, and it returns them to us listing their ids (which, as you probably remember, must be unique). The following select statement for our query tells Graql to specifically focus on the parts of the results we are specifically interested in (in this case, the oncologist_name resource of the oncologist entities returned by the match statement).

And this is the (more useful) result.

Basically, once you have found the entities and relations you are looking for with the match statement, you can choose to have Graql show only some of the information retrieved with the select statement.

Furthermore, if you want Graql to show you the values of some resource for the variables defined in the match part of your query, you use the syntax above. To inspect resource resource_name of the matched variable $variable, you add

to your query.

No limits, just Delimiters

As you might have noticed, our query results are slightly messy and it is a bit complicated to understand who are the oncologists in our graph. Wouldn’t it be nice if we could get, say, the results in alphabetical order, maybe just a small bunch at a time?

That is what delimiters are for; let’s add a few of them at the end of our query:

The line is quite easy to understand: we are telling Graql to order our results according to the resource oncologist-name, start from the 15th result (that is what the offset keyword is for) and show us just 10 results (that is the limit part).

Just to be clear: offset and limit delimiters are clearly not needed here, but I thought you might have liked seeing something more than just order by statement :)

Here is our complete query with the results:

It’s–slightly more–complicated

Let us conclude with a slightly more complicated query involving relations. Imagine that you want to check for the most prolific collaborations among our oncologists.

The trivial thing to do is to just treat the co-authorships as normal nodes in the graph, which, in fact, they are. Let’s try:

The problem with this approach is that we are not really interested in the relation nodes (except for the value of their number-of-papers resources): we are interested in the entities that are linked by those nodes (these are what we call the role players). To specifically query the concepts we are interested in, we have to specify variable names for each role player in order to refer to them later. All in all, everything is quite simple

Once again, the syntax for the match part is basically the same we used when inserting relations (except that we are substituting role players with variables). We then select what we want to be shown. Finally, we give some instruction on how to show the results.

It is really quite simple. And that is the beauty of it.

We are at the end of my short introductory exploration of the Grakn software stack. I have only begun to cover the topics of how to build an ontology, load data into a graph and query it, but there is a lot more to explore.

Hopefully, I have stimulated your curiosity enough to go download it and give it a try. Or have a look at the source code on GitHub.

It is new, open source and has a friendly community of developers to help overcome the problems you might encounter.

Stay tuned,

M.

--

--

Michelangelo Bucci
Vaticle

Discrete mathematician/Theoretical computer scientist, learner, curious about stuff.