Querying RDF with Elixir

Using SPARQL.ex to query over RDF datastores

“worms eye view of buildings” by Alex wong on Unsplash

In my last post I talked about the package by Marcel Otto for RDF processing in Elixir. Now Marcel has also added a package for querying RDF datastores with SPARQL. So let’s have a look at that. In fact there are two separate packages: for querying in-memory RDF models, and for dispatching queries to remote RDF models. So let’s first deal with local (in-memory) models and then go on to remote models. (And again a special thanks to Marcel for reviewing this post and making super-useful comments as well as suggesting improvements.)

1. Create a ‘TestQuery’ project

First off, let’s create a new project (in camel case) using the usual Mix build tool invocation (in snake case):

We’ll then declare a dependency on in the file. (This will bring in the and modules too.) And we’ll also use the HTTP client in Erlang as recommended.

And we add this line to the file:

We then use Mix to add in the dependency:

Let’s also clear out the boilerplate in and add in a annotation.

See here for the project code.

2. Query in-memory RDF models

We’re going to need some RDF data. To keep things simple we’ll take the RDF description we generated in the last post for a book resource.

For convenience we’ll add this file to the project as and we’ll add to the directory which we’ll need to create.

Now let’s check that we can access this file by creating ourselves a simple function which will just read the file.

The attribute uses the Erlang function to locate the directory for the current module (named with an alias as in Elixir, and which in Erlang is rendered directly as ).

The function simply calls an RDF read convenience function for a particular serialization (Turtle). It uses the bang () form which returns an Elixir term directly or else errors. We also use the string concatenation operator to append the filename to the path.

So, we can try this out now with IEx. We’ll also import the module so that functions can be called without any module name qualification (although for later ease of use we will include this command in the IEx configuration file ).

And if we want a pretty-printed version of that we can pipe the output to an RDF write convenience function using the pipe operator . This function again uses the bang () form which returns an Elixir term directly or else errors.

That’s cool.

Now we also need a SPARQL query. Let’s just create a simple query which returns all the RDF terms under the variables , , . And we’ll save that query as the attribute .

Now let’s define a simple function which will just pass off to a function using the attribute.

And now we can define a function as:

This will create an RDF model from our file and execute the SPARQL query over it. The result is a struct.

And we can process this query result using regular Elixir data access:

Here the helper in IEx refers to the last result, i.e. the return from our call. We just match this against the variable for convenience. We can then pull out all the RDF objects (via the SPARQL query variable ) from the list of maps and print their string presentations. We use the partial function application to print out RDF object values.

Note that there is a new (as yet unreleased) function, which would simplify this expression to:

Now just to check on which functions we have created we can use the function.

3. Query remote RDF models

To query a remote RDF datastore let’s set up a new module for our testing with . We’ll add a new directory and create a file for the module.

We’re going to be using DBpedia and the DBpedia SPARQL endpoint for our remote querying. Let’s define some module attributes to make things easier.

What’s here?

  • – a test URI, here a DBpedia resource
  • – a test SPARQL query using the test URI and matching English-language strings for RDF object literal values
  • – a test SPARQL endpoint, here the DBpedia endpoint

Module attributes are private to the module but we can define a couple of accessor functions:

Let’s define a function which will use the function to query the test service () with the test query ().

As before the query result is a struct, or more precisely a tuple with an atom and the struct which we’ll save to the variable . We can access the actual results (a list of maps) from the field of the struct and we can pipe those into , an Elixir enumerable function. Again we use the partial function application to print out RDF object values.

So, let’s try it.

Great. We just queried DBpedia and parsed the result set for English language strings.

So we can now define some functions for remote query (, , ) which mirror the local query forms (, , ) we produced earlier.

And again let’s check on the functions we have now created with the function.

Now we’re ready to experiment with those functions or move on to something else.

4. Inspect result sets using the Observer

So, at this point let’s try something a little more ambitious.

We’re going to read a bunch of queries from local file storage, apply them against a remote service, and store the results for inspection using one of the really cool Erlang tools that ships with Elixir – the Observer.

For the queries we’ll use one more module attribute which uses the Erlang function to locate the directory in the main module.

For this application we’ll just save some simple queries to be applied to a given service. We’ll be querying DBpedia again.

The queries are all the same. They are all simple SPARQL select queries, each querying for a particular hurricane in the 2018 Atlantic hurricane season.

So we have this directory structure.

Before we get to the Observer, let’s talk about the storage for query results. Be warned that we are going to use an advanced facility of the Erlang runtime.

Erlang uses the actor model and implements actors as processes which are one of its main language constructs. These are very lightweight structures and are implemented at the language level – not the OS level. Communication between processes is strictly via message passing and state is private to the process.

Now, Erlang also maintains a powerful storage engine built into the runtime. This is known as Erlang Term Storage (ETS) and is a robust in-memory store for Elixir and Erlang terms. Tables in ETS are created and owned by individual processes. When an owner process terminates, its tables are destroyed.

There are many reasons to be wary of reaching for ETS tables for production applications (shared access, garbage collection, etc.) but for this tutorial we will use ETS tables as a simple cache mechanism to store our query results so that we can inspect these readily with the Observer tool. Note that normally one would use special process types such as a GenServer (or an Agent, which is basically a GenServer under the hood) to hold process private state. But before talking more about the Observer let’s look first at how we will run our queries and save the results sets to ETS tables.

We’ll define a function which will first read filenames from our query directory and then iterate over those, reading the query from the file and sending this to the service and storing the results. We use the and file system functions together with the module attribute which supplies the directory. The second part uses a list comprehension to iterate over the list. Note that the processing is handled by private functions which we explicitly label with a leading underscore.

The function is defined using for a private function. Our plan here is just to slurp the file contents into a variable and to return this in a tuple together with an ETS table name. Here the table name is an atom holding a name compounded of the file name (without file extension) and with the current module as a prefix. So, for example, the file will be used to generate an ETS table name of . (Note that the prefix is implicit in all Elixir module names.)

The function return is piped into a helper function which just unpacks the tuple into two arguments and invokes the real function.

This function uses two Erlang functions and to create and populate the ETS table. Each result is read from the list of maps in the field of the struct and each map is repackaged as a tuple by the function.

This function just expects three keys in the triple map : , , and . Both RDF subjects and predicates are IRIs so we can just fish out the field of the IRI struct. But RDF objects may be either IRIs, literals, or blank nodes. So we’ll need to test those and use the or field of the appropriate struct accordingly.

Note that this testing on RDF object type is admittedly a little low-level and we might expect a convenience function to support this in a future release.

We return a tuple for inserting into the ETS table using which will list subject , predicate , object , as well as the raw triple map that was returned. We want to include a key for each tuple so simply make use of the function to provide a unique integer ID.

And that’s it!

So, let’s try it.

So, something happened. Let’s see. For this we’ll reach for the Observer. The Observer is a graphical tool for observing the characteristics of Erlang systems. The Observer displays system information, application supervisor trees, process information, ETS tables, Mnesia tables and contains a front end for Erlang tracing. Just a lot of things.

We invoke the Observer as:

The Observer UI will pop up in a new window. (And to close this down we can just use the function.)

Now, there’s an awful lot going on here. But for the purposes of this tutorial we’re just going to look at the Table Viewer tab.

Now, we’re going to inspect some tables. We might need to click on the Table Name header to sort the tables. For our purposes let’s open the table we created . And this is what we should see:

Now each row can be separately inspected just by clicking on it.

And just by way of showing that what we can write into an ETS table we can also read out. This function just trivially prints out one of the terms (the RDF object value) stored in a table. Interesting here is the pattern matching on the tuple to very simply get at one of the terms.

We can just run this as follows:

Note the quoting on the ETS table name .

Summary

I’ve shown here in this post how the and packages can be used for querying RDF datastores in Elixir.

Specifically we’ve used to query local (in-memory) RDF models, and provided some convenience functions for further exploration. We then used to query remote RDF datastores and again provided some convenience functions for further testing.

We then proceeded to develop a small demo which read stored SPARQL queries, applied them to a remote SPARQL endpoint and then saved the results in the Erlang runtime as ETS tables to inspect the result sets using the wonderful Observer tool.

Introducing the Observer brings us that little closer to the Erlang system in flight with its process tree. It is this very granular process model which allows us to think about new solutions using a distributed compute paradigm for semantic web applications. I hope to be able to follow up on some of this promise in future posts.

See here for the project code.

This is the second in a series of posts. See my previous post at ‘Early steps in Elixir and RDF’.

You can also follow me on Twitter as @tonyhammond.

Distributed data, distributed compute – the graph! | #writing, #workseeking