This blog post contains examples of an earlier unsupported version of Grakn. Some links may be unavailable. Please visit the Grakn Documentation for up-to-date examples.
In a previous blog tutorial, we demonstrated how to import some example SQL data into Grakn. In this article, we will work with the same data, which is about countries and cities of the world. Here, we use it to illustrate how to use inference to find information that is stored implicitly within the dataset.
This article will be useful if you are getting started with Grakn and want a simple example of how to write inference rules using Graql Rules. If you haven’t already set up Grakn, please see the previous tutorial, or check out our setup guide.
Introduction to Inference
Consider the following statements:
(If) grass is not an animal.
(If) vegetarians only eat things which are not animals.
(If) sheep only eat grass.
It is possible to infer the following:
(Then) sheep are vegetarians.
The initial statements can be seen as a set of premises. If all the premises are met we can infer a new fact (that sheep are vegetarians). If we hypothesize that sheep are vegetarians then the whole example can be expressed with a particular two-block structure: IF some premises are met, THEN a given hypothesis is true.
This is how reasoning in Graql works. It checks whether a set of Graql statements can be verified and, if they can, makes an inference from a second block of statements. The first set of statements (the IF part or, if you prefer, the antecedent) is called the left hand side (LHS). The second part (also know as the consequent) is, not surprisingly, the right hand side (RHS). Using Graql, both sides of the rule are enclosed in curly braces and preceded by, respectively, the keywords
Setting Up the Example
Our example can be found on Github in the sample-projects repo. We aren’t going to walk through how to migrate SQL data here, since that’s a topic that was covered previously, although the scripts to perform migration directly from SQL into Grakn are available in the repo (just consult the readme file).
For simplicity, we are going to load the ontology and data directly into Grakn. If you haven’t already done so, please download and install the latest version of Grakn (I used 0.12), and start the engine from your terminal. If you’re not sure about any of this, please see our setup guide.
Download the example from the sample-projects repo and load the ontology into a graph:
bin/graql.sh -f ontology.gql
Then load the data (this may take a few minutes):
bin/graql.sh -f data.gql
What’s In The Data?
My esteemed colleague Miko has already discussed inference in an earlier blog article. Things have changed a little in that the Graql syntax has moved on since he wrote it, but his article included a very nice explanation of how inference works, using Italian cities, provinces and region to illustrate. It was such a neat example that I’ve found a practical example that is similar. Let me explain…
The SQL data I migrated into Grakn consisted of a number of tables. In this example, I’m looking at the table of data about countries and a separate table about cities. The countries table contains a number of columns with data about individual countries (such as their name, population, life expectancy, surface area, etc). For simplicity, for each country, I migrate just the name, international country code, world region (e.g. Eastern Africa, Southeast Asia) and continent it is situated in.
The city table contains a number of columns, but, in this example, I’ve imported the name of the city and the local district it resides in, which seems loosely based on the division of a country into states (e.g. Texas, Iowa, etc for the US) or provinces/territories.
The city table also contains a country code to represent the country the city is located within. I’ve used that to build a relation between cities and countries using the Grakn knowledge model. The
has-city relation is shown in the ontology, which is pretty simple, but I’ll explain it further below:
country sub entity
plays contains-city;city sub entity
plays in-country;has-city sub relation
relates in-country;contains-city sub role;
in-country sub role;name sub resource datatype string;
countrycode sub resource datatype string;
continent sub resource datatype string;
world-region sub resource datatype string;
local-district sub resource datatype string;
inf-local-district sub resource datatype string;
inf-continent sub resource datatype string;
inf-world-region sub resource datatype string;
There are two entities, to represent a city or a country, each having associated resources to reflect the information I’ve imported from the SQL dataset. So, for example, a
city has a
local-district resource, which is a
There is a relation between a
country entity called
has-city, where there are two roles,
contains-city (played by the country) and
in-country (played by the city). Since the city has a connection to a country, the information about the country can be inferred to apply to the city, and vice versa.
IF a city is in a country, THEN it must be located in the same continent and world region as the country.
IF a country contains a city, THEN it must contain the local district in which the city is located.
This is why, in the ontology, a
city entity has a resource called
inf-continent. The reasoner uses a set of “rules”, which are a Graql version of what I’ve written above, to work out
inf-continent by inspecting the related
country, and the
continent it resides in. Likewise, the
city entity has a resource called
inf-world-region, and the
country entity has a resource called
As humans, we understand the concept that a city is in a country, and a country is in a continent, and thus a city is also in the same continent as the country. We have to write rules for a computer to make the same intuitive leaps.
At the bottom of the ontology.gql file, you’ll see the coded Graql rules for reasoning over the dataset. For example, to infer the continent in which a city is located:
$city-in-continent isa inference-rule
(contains-city: $country1, in-country: $city1) isa has-city;
$country1 has continent $continent1;
$city1 has inf-continent $continent1;
You can find out more about writing rules from the Grakn Labs documentation.
Note that, while we are able to make inference using city, country and continent information (and similarly for district and region fields), it isn’t possible or sensible to apply the same model to all the information in the world dataset. For example, we cannot say that because the population of a country is 1 million, and a city is within that country, the population of the city is also 1 million, since a country is usually more than a city (Vatican City being a possible exception). We could write a rule that says that if the population of a country is
x, we know that the population of any city within that country is less than
x. Common sense, from a human brain, is needed before reasoning can take place!
Making Some Queries
Let’s make some queries to get some inferred knowledge from the world database.
Firstly, let’s find out the local district, world region (inferred) and continent (inferred) for a couple of cities, Cardiff and Melbourne:
match $x isa city, has name “Cardiff”, has local-district $d, has inf-continent $ic, has inf-world-region $ir;$d val “Wales” isa local-district; $x id “3260448” isa city; $ir val “British Islands” isa world-region; $ic val “Europe” isa continent;
So Cardiff is in district Wales (that’s in the data), but the reasoner is also telling us that it is in the British Islands (region) which is in Europe (continent).
match $x isa city, has name “Melbourne”, has local-district $d, has inf-continent $ic, has inf-world-region $ir;$d val “Victoria” isa local-district; $x id “3842208” isa city; $ir val “Australia and New Zealand” isa world-region; $ic val “Oceania” isa continent;
Melbourne is in Victoria, which is in Australia and New Zealand (region), which is in Oceania (continent).
Now let’s specify a country and find all the local districts it contains through reasoning.
match $x isa country, has name “Australia”, has inf-local-district $d;$x id “147552” isa country; $d val “New South Wales” isa local-district;
$x id “147552” isa country; $d val “Tasmania” isa local-district;
$x id “147552” isa country; $d val “South Australia” isa local-district;
$x id “147552” isa country; $d val “Queensland” isa local-district;
$x id “147552” isa country; $d val “Capital Region” isa local-district;
$x id “147552” isa country; $d val “Victoria” isa local-district;
$x id “147552” isa country; $d val “West Australia” isa local-district;
Now let’s get a bit more creative, and ask about cities containing the word “Victoria”, and find out where in the world they are.
match $x isa city, has name contains “Victoria”, has local-district $d, has inf-continent $ic, has inf-world-region $ir;$d val “Hongkong” isa local-district; $x id “4878472” isa city; $ir val “Eastern Asia” isa world-region; $ic val “Asia” isa continent;
$d val “Mah\u00E9” isa local-district; $x id “28074144” isa city; $ir val “Eastern Africa” isa world-region; $ic val “Africa” isa continent;
$d val “Tamaulipas” isa local-district; $x id “11374728” isa city; $ir val “Central America” isa world-region; $ic val “North America” isa continent;
$d val “Las Tunas” isa local-district; $x id “21840032” isa city; $ir val “Caribbean” isa world-region; $ic val “North America” isa continent;
Now let’s get 5 cities in Oceania:
match $x isa city, has inf-continent “Oceania”, has name $n; limit 10;
$x id “1519784” isa city; $n val “Tafuna” isa name;
$x id “1949832” isa city; $n val “Canberra” isa name;
$x id “3842208” isa city; $n val “Melbourne” isa name;
$x id “3920096” isa city; $n val “Townsville” isa name;
$x id “3838176” isa city; $n val “Adelaide” isa name;
The queries above show how information can be inferred from data, despite not being stored explicitly. It is a very trivial example, but the intention is to show the basics of reasoning and how to construct rules in Graql.
Of course, real world models are filled with hierarchies and hyper-relationships, but this makes querying a dataset challenging, not least because traditional query languages are only able to retrieve explicitly stored data, and not implicitly derived information. Grakn is a solution for working with complex, interconnected data, such as that which intelligent applications rely upon. Grakn allows an organisation to grow its competitive advantage by uncovering hidden knowledge that is too complex for human cognition. The data model can evolve as a business “learns” all while reducing engineering time, cost and complexity.
Find out more from https://grakn.ai.
If you enjoyed this article, please do find the time to hit the recommend heart below, so others can find it too. Please get in touch if you’ve any questions or comments, either below or via our Community Slack channel. Thank you for reading!