Jupyter Notebooks with Elixir and RDF

Using IElixir in JupyterLab with the SPARQL.Client package

“white bridge with lights” by Tyler Easton on Unsplash

Now one can’t really claim to be a bona fide data scientist these days if one isn’t brandishing some flavour of notebook to save and run one’s calculations. And arguably one of the best known names in this breed of application is the Jupyter Notebook which is an excellent vehicle for data processing with Python. And Python as a language has become almost synonymous with data wrangling.

But these days there’s much more to Jupyter than the original name begetting trio of languages – Julia, Python, and R. And also notebooks themselves are useful for much more than just data processing alone, e.g. iterative code development and exploration, and multimedia code description and sharing.

So in this post we’ll take a quick look at the IElixir Jupyter kernel from Piotr Przetacznik which will allow us to explore Elixir from within a rich media, interactive compute framework. And following on from some recent posts we’ll use the SPARQL.Client package for Elixir for querying RDF datastores.

Jupyter Notebooks

Jupyter Notebooks (or formerly IPython Notebooks – and hence the usual file extension .ipynb) are computational notebooks which integrate live code, text, math and media and are the result of a long line of development. Since Project Jupyter was started up in 2014 the use of Jupyter Notebooks has all but exploded. And just as recently as yesterday, Nature magazine published an article ‘Why Jupyter is data scientists’ computational notebook of choice’ in which it reports that were ‘more than 2.5 million public Jupyter notebooks in September 2018, up from around 200,000 or so in 2015’.

The notebook pedigree goes at least as far back as the literate programming notions first introduced by Knuth with his WEB system in which he famously programmed TeX. This approach pairs off a document formatting language with a programming language so that one source file may be unbundled either as a document for reading or as a program for executing.

Later developments led to the nascent notebook format itself by which Mathematica, Maple and MATLAB supported interactive numerical and mathematical computations.

More recently the genre has been picked up and expanded by the data science community using IPython and data manipulation with packages such as NumPy. And IPython has been subsequently generalized into Jupyter to support multiple compute engine backends, initially with support for the core langauges of Julia, Python and R. There are now dozens of Jupyter kernels available with most major programming languages represented.

And just earlier this year the Jupyter Notebook interface itself has been been significantly elaborated and upgraded into the much more powerful and extensible JupyterLab.

In essence with notebooks we have something akin to a webified command loop (or, as it’s sometime known, REPL) which can be annotated with rich text, math and media and which can be replayed on a cell-by-cell basis or as a whole. The inputs (and outputs) are saved as JSON files which can be saved for editing and playback and shared directly with other users or hosted for wider community access. The kernels, or compute engines, can either be locally or remotely hosted. The schematic below gives a general idea of the overall architecture.

From Jupyter documentation.

IElixir and installation

IElixir is a Jupyter kernel specifically built for running Elixir programs, or program fragments.

See the IElixir project for information and help on installation. You would normally need to have both JupyterLab and Elixir pre-installed.

As an alternative, note that there is a docker image available. I have not tried this myself but there is additional information here on the IElixir project.

Walkthrough

Now for this walkthrough we will use the TestQuery project developed earlier in this series:

And we will also be walking through the notebook test_query.ipynb which accompanies this post.

For this walkthrough we will be using JupyterLab as our notebook environment. So, we’ll assume now that we already have JupyterLab installed. And we’ll also either download the project locally or just follow along with the GitHub rendition.

We can invoke JupyterLab from the command line as:

% jupyter lab

This is our main window on opening JupyterLab and navigating to the project home page using the file explorer, from which we can then open up the test_query.ipynb notebook in priv/notebooks/.

Let’s hide the file explorer by toggling the Files tab and just focus on the notebook pane itself:

We’ll follow the plan outlined in the notebook here.

1. Creating the project

I’ve first raised a couple caveats relating to my own specific lack of knowledge in how best to interact with an Elixir Mix project. (And hopefully somebody can set me straight on this.) We’ll start by copying the test_query project developed in the earlier post into a new test_ipynb project so that we can make any adjustments necessary.

2. Setting up the environment

IElixir uses the concept of virtual environments for managing packages. It uses Boyle as its package manager.

We use Boyle to install the SPARQL.Client.ex package and then check the modules with the IEx.Helpers.exports/1 function.

3. Running basic SPARQL.Client queries

Before going further we’ll just try out a basic SPARQL query to retrieve one RDF triple. And this indicates that there is a problem for this service using the default POST method with URL escaping.

Instead, we need to call the SPARQL.Client.query/3 function with the GET method. This time around it works fine. And if we run it again and capture the function return we can inspect the SPARQL query result set.

Well, OK, that works now. So let’s try a more meaningful query.

4. Installing our TestQuery modules

Let’s first load our TestQuery module after confirming with the IEx.Helpers.exports/1 function that none of its functions are available. We use the IEx.Helpers.import_file/1 to directly load the module.

This fails and if we look into the source in test_query.exwe can see that the call :code.priv_dir(:test_query) is failing since the application test_query is unknown. (See the caveats raised earlier.)

Replacing that with the relative path to the priv/ directory in our test_ipynb project we can try importing again. And this time it works as confirmed by checking the exported functions.

The same thing happens when we attempt to import the TestQuery.Client module. This time the module attribute @query_dir needs to be replaced.

Replacing that with an absolute path to the priv/queries/ directory in our test_ipynb project we can try importing again. And again this time it succeeds as confirmed by checking the exported functions.

Now we’ve successfully imported our modules but as we noted earlier we are going to have to update our query functions to use a GET method with the right SPARQL protocol. We’ll do this by introducing a new module attribute @query_opts which sets the keywords :request_method and :protocol_version, and we’ll add this module attribute to all our SPARQL.Client.query/3 calls.

We’re good to go.

5. Testing it out

To test out our SPARQL client functions let’s first import the modules for ease of use. (Strictly we only want the TestQuery.Client module, but no matter.)

We’ll first try out our hello/0 function which queries DBpedia for the ‘Hello World’ resource and returns the English language label. This is returned as a list containing an RDF.ex literal struct but is easily unpacked by accessing the struct value field.

And as that works, we’ll follow up with one of our stored queries and use the rquery/1 function on that. Again we can return a list containing an RDF.ex literal struct.

And finally we can try out the rquery_all/0 demo function which preforms multiple queries for some 2018 Atlantic hurricanes and stores the results in separate ETS tables. These tables can be inspected using the Oberver tool, as described in the earlier post. (In short the :observer.start call opens up the Observer interface in a standalone window and by selecting the ‘Table Viewer’ tab one can see the ETS tables we created and inspect them by double clicking the respective rows.)

Summary

I’ve shown here in this post how we can use the IElixir kernel to run a Jupyter Notebook with Elixir as our language of choice.

Specifically we have taken the TestQuery module developed in an earlier post and shown how this can be used within a Juyter Notebook context. We are able to run SPARQL queries using the SPARQL.Client package directly from the notebook and to process the result sets. Moreover, since we are using a notebook we can better experiment with and annotate our findings.

The tooling around Elixir is improving all the time and with its proven support for managing distributed compute Elixir stands out as a most fascinating and intriguing language for building semantic web applications.



See here for the (modified) project TestQuery code with test_query.ipynb notebook.

This is the fifth in a series of posts. See my previous post ‘Robust compute with RDF queries’.

You can also follow me on Twitter as @tonyhammond.