Populating GRAKN.AI with the World

This updated article describes how to move SQL data into a GRAKN.AI knowledge graph

Jo Stichbury
Oct 21, 2016 · 9 min read
Image by davecito is licensed under CC BY 2.0

Since the initial writing of this blog post, the process of migrating a dataset into a Grakn knowledge graph has changed. For the purpose of migration, please visit the examples with Grakn Clients Java, Node.js and Python.

Highly interconnected data from complex domains is challenging to model, maintain and query. With GRAKN.AI, which is an open source, distributed knowledge base (think a graph database with an extra punch), building intelligent systems becomes dramatically easier.

If you are thinking of trying GRAKN.AI, the chances are that you already have some data and want to see what our stack can do. This post shows how to move data from a SQL database to GRAKN.AI, and how to use our visualiser to explore the data.

I’m going to use a simple, well-known, example set of data about cities and countries of the world. In the course of this post, I will go through the basics of how to set up a SQL database and make some simple queries (which you can omit if you’re already a SQL user), then I will explain how to migrate the contents of the database into GRAKN.AI. I’ll use our declarative query language, to make the same Graql queries as shown in SQL.

I have used MySQL for this example, although the Grakn Labs team have also tested with Oracle and PostgresQL. The version of GRAKN.AI I used was 0.12.1 but, as ever, I recommend that you use the latest version.

Setting up MySQL

If you already have a MySQL setup, you may want to skip ahead to “Hello World” section.

Getting MySQL installed is relatively straightforward. I followed these instructions for MacOS X (downloading the installation package here). It wasn’t totally clear how to start the MySQL Server (I eventually worked out that installation puts an icon in your System Preferences menu). Having started the server, you need to start the MySQL shell from the terminal

On first use, change the temporary password it was installed with (I set my new password to root, which is the same as the username, so it was memorable).

“Hello World”

Once I had MySQL set up, it was time to open the example world database from the command line, as described here.

It is a classic example dataset published by MySQL that is easily understood and is commonly used by beginners to introduce them to SQL queries (however, I cannot guarantee that the data it contains is accurate).

Using the following query to retrieve information about the columns of the city table:

As you can see, there are 5 columns in the table (confusingly shown as rows in the output): ID, Name, CountryCode, District and Population. The ID is the Primary Key for the table, which uniquely identifies each record. This will be important later, when we migrate to Grakn.

The following statement gives us a peek at the first 10 items of city data:

Let’s get the information about a specific city: Sydney, Australia

You can experiment in a similar way with the country and countrylanguage tables.

Migrating to GRAKN.AI

OK, you should have a reasonable handle on the data, so now let’s migrate it into GRAKN.AI. The first thing, if you’ve not done it already, is to follow the quickstart guide to download GRAKN.AI and start the Grakn engine.

Ontology

There are limitations on the SQL format that prevent it from expressing the semantics of the data. By “semantic”, I mean that the meaning of the data cannot easily be encoded alongside the data itself. In contrast, a knowledge graph is self-descriptive, or, simply put, it provides a single place to find the data and understand what it’s all about. To have the full benefit of a knowledge graph, we must write the ontology for the dataset.

Writing an ontology for every field in the SQL tables will be too unwieldy so we will take just some of the data contained in the table, as follows:

We define 3 entities to represent country, city and language, and some relations between them (between language and country and country and city).

In the terminal, load the ontology as follows:

Here’s a visual representation of the ontology:

Migration Templates

Once we have written and loaded the ontology for the dataset, we need to use the Graql templating language to instruct the SQL migrator on how the SQL data is mapped to the above ontology. The SQL migrator applies the templates we provide to the results of an SQL query, to each row of results in turn, replacing the indicated sections in the template with the corresponding data. The column header is the key, while the content of each row at that column is the value.

To migrate the country data, the template code is as follows:

The language migration template:

Then, to insert a relation between the language and the countries in which it is spoken, there is a match-insert query, which matches a language and country, then builds a relation between them:56

For city migration, the template is as follows:

To determine if it is the capital city:

Migration Tools

There are two ways in which you can migrate SQL data into GRAKN.AI. You can use Java to perform the migration in a few lines of code, which is described further in our SQL migration example.

Alternatively, there is a shell script that you can call to apply the templates above to the SQL data. In effect, the shell script calls a set of Java functions so you don’t have to. I like this option, as Java is not my natural habitat. The migration documentation shows the script options, which are as follows:

Whether you use the shell script or Java code, what you are doing in either case is extracting SQL data using the JDBC API and importing it into a graph.

For this example, running from within the examples/example-sql-migration/shell-migration directory of the GRAKN.AI installation, I called Grakn’s migration.sh script on each of the templates shown above, passing in the appropriate query. For example, for the countries migration:

However, to make it simpler, you can chain them all together in a batch file, which I’ve done and simply call that:

Querying the Graph

We can now make queries on the graph as follows. Let’s reproduce some of the queries we made previously in SQL.

If successful: you should see a list of 10 countries following the query.

Similarly, query for information about the city of Sydney:

Problems?

The Grakn team is super helpful in sorting out when things went wrong and, if you have any problems, please get in touch for help too. Just make contact via our Community page, or leave a comment below or on our Slack channel.

One issue that I hit on initially is that you need to make sure that you download the JDBC driver from here and place the .jar file (mysql-connector-java-5.1.40-bin.jar) in the /lib directory of the Grakn environment that you downloaded and set up.

Visualising the Data

The Grakn visualiser is a cool way to look at the resulting graph and explore the data. With the Grakn engine running and the graph loaded, in your browser, navigate to http://localhost:4567/ which will allow you to make queries on the graph. It’s a nice way of seeing how data is connected, which fits much better — in my view — with how I think about cities and countries, than a table with rows and columns. Let’s explore …

We can show 10 countries and their cities:

In the GRAKN.AI visualiser:

Further guidance on using the visualiser, which is rapidly evolving, can be found in the GRAKN.AI documentation.

Why?

You may be wondering why I’ve bothered moving the data from a relational database into a Grakn graph. After all, isn’t it fine as it is? Well yes, and no. Although relational databases have benefits that include simplicity and familiarity, they also have limitations. If you are just doing basic read/writes on straightforward data, SQL may well be adequate for your needs. But, remember that I chose a simple, familiar example specifically to make this article easy to follow. If you have a more complex domain with highly interconnected data, which is very probable in today’s information landscape, you will quickly see significant benefits, since describing the relationships within data is the primary characteristic of a graph database. In this aspect, a relational database cannot begin to provide the equivalent speed or flexibility as a graph.

As a knowledge base, GRAKN.AI has an additional benefit over standard graph databases, since it allows complex data modeling, verification, scaling, querying and analysis. A key step is the definition of an ontology, which facilitates the modeling of complex datasets and guarantees information consistency. Inference rules allow the extraction of implicit information from explicit data, to achieve logical reasoning over the represented knowledge.

Conclusion

At the beginning of this article, I introduced GRAKN.AI as a graph database with extra punch. I have hardly scratched the surface of what it can do, but I hope I have at least shown that it is easy to set up a Grakn graph with familiar data, and how to query and visualise it.

There are a number of blog posts on blog.grakn.ai that will give you a flavour of what GRAKN.AI can do, and an FAQ that possibly answers some of the questions you may have! If it does not, please ask away in the comments below!

If you enjoyed this article, please hit the heart button below or leave us a comment. Thank you!

Vaticle

Creators of TypeDB and TypeQL

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store