There R Pandas in my Graph!

Graph based Data Science with Grakn and Graql

Michelangelo Bucci
Oct 18, 2016 · 4 min read

“Connectivity becomes a craving” Sherry Turkle

Graql: now with 100% more Pandas (Photo credit: iStock.com/Hung_Chung_Chih)

Simplicity, thy name is Graql

Last week we were suddenly struck by the realisation that it would be simple to extract data from a Grakn graph and use it as a data science tool for analysis, more so since our analytics component is still at an early stage of development.

When you send a query to Graql, the results returned are, essentially, a table. Wouldn’t it be cool if you could just send a query and easily store the results in any data frame like structure?

Turns out, it’s really easy.

Yes, but how?

graql.sh -e QUERY

If you call the graql script in the shell with the -e option you can pass to it any query, and it will return some–very parsable–results.

So, to integrate Graql into your favourite data science environment you have to:

  1. Call the graql shell script
  2. Parse the results from the standard output
  3. Store the results into a data frame
  4. Profit!

Believe me, it’s easier than it sounds.

In fact, we were so excited by this realisation, and it is so easy to do, that we immediately put together a couple of quick and dirty scripts, one in R and one in Python to do exactly that. They are not wrappers, not real complete drivers, and they certainly are not polished yet (we will work on them, I promise), but they do run perfectly fine and allow you to interface Graql with Pandas or any R package you like.

Just as an example, imagine that you have stored our sample movie dataset in your graph and you want to extract a list of movies with their budget and a few rotten-tomatoes-scores-related properties and store the result in a data frame.

If you are working with R, you just need this:

If, on the other hand, you prefer Python and Pandas:

That’s it. In both cases, after having loaded our scripts, you need just a single command, and you are ready to analyse your data.

And you can check that, for example, it seems that the budget does not seem to influence the rotten-tomatoes score of a movie much.

While, on the other hand, it seems that there is some correlation between the budget and the number of votes, which makes sense, since bigger budget means more marketing:

Or maybe you want to use some extra networks analysis package to visualise and analyse the network of actors and movies in the dataset…

Actors are in red, movies in blue

Or maybe you want to do something else entirely; it’s up to you. It only takes one command.

That’s all folks!

If you use something different from R or Pandas, if you want more features added to the interface scripts, if you have questions or if you just want to say “Hi”, please join us on our community site or Slack channel: we are more than happy to help.

Happy data science-ing!
M.

Vaticle

Creators of TypeDB and TypeQL

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store