Jupyter Notebooks with Elixir and RDF
Now one can’t really claim to be a bona fide data scientist these days if one isn’t brandishing some flavour of notebook to save and run one’s calculations. And arguably one of the best known names in this breed of application is the Jupyter Notebook which is an excellent vehicle for data processing with Python. And Python as a language has become almost synonymous with data wrangling.
But these days there’s much more to Jupyter than the original name begetting trio of languages – Julia, Python, and R. And also notebooks themselves are useful for much more than just data processing alone, e.g. iterative code development and exploration, and multimedia code description and sharing.
So in this post we’ll take a quick look at the IElixir Jupyter kernel from Piotr Przetacznik which will allow us to explore Elixir from within a rich media, interactive compute framework. And following on from some recent posts we’ll use the
SPARQL.Client package for Elixir for querying RDF datastores.
Jupyter Notebooks (or formerly IPython Notebooks – and hence the usual file extension
.ipynb) are computational notebooks which integrate live code, text, math and media and are the result of a long line of development. Since Project Jupyter was started up in 2014 the use of Jupyter Notebooks has all but exploded. And just as recently as yesterday, Nature magazine published an article ‘Why Jupyter is data scientists’ computational notebook of choice’ in which it reports that were ‘more than 2.5 million public Jupyter notebooks in September 2018, up from around 200,000 or so in 2015’.
The notebook pedigree goes at least as far back as the literate programming notions first introduced by Knuth with his WEB system in which he famously programmed TeX. This approach pairs off a document formatting language with a programming language so that one source file may be unbundled either as a document for reading or as a program for executing.
More recently the genre has been picked up and expanded by the data science community using IPython and data manipulation with packages such as
NumPy. And IPython has been subsequently generalized into Jupyter to support multiple compute engine backends, initially with support for the core langauges of Julia, Python and R. There are now dozens of Jupyter kernels available with most major programming languages represented.
And just earlier this year the Jupyter Notebook interface itself has been been significantly elaborated and upgraded into the much more powerful and extensible JupyterLab.
In essence with notebooks we have something akin to a webified command loop (or, as it’s sometime known, REPL) which can be annotated with rich text, math and media and which can be replayed on a cell-by-cell basis or as a whole. The inputs (and outputs) are saved as JSON files which can be saved for editing and playback and shared directly with other users or hosted for wider community access. The kernels, or compute engines, can either be locally or remotely hosted. The schematic below gives a general idea of the overall architecture.
IElixir and installation
IElixir is a Jupyter kernel specifically built for running Elixir programs, or program fragments.
See the IElixir project for information and help on installation. You would normally need to have both JupyterLab and Elixir pre-installed.
Now for this walkthrough we will use the
TestQuery project developed earlier in this series:
And we will also be walking through the notebook
test_query.ipynb which accompanies this post.
For this walkthrough we will be using JupyterLab as our notebook environment. So, we’ll assume now that we already have JupyterLab installed. And we’ll also either download the project locally or just follow along with the GitHub rendition.
We can invoke JupyterLab from the command line as:
% jupyter lab
This is our main window on opening JupyterLab and navigating to the project home page using the file explorer, from which we can then open up the
test_query.ipynb notebook in
Let’s hide the file explorer by toggling the
Files tab and just focus on the notebook pane itself:
We’ll follow the plan outlined in the notebook here.
1. Creating the project
I’ve first raised a couple caveats relating to my own specific lack of knowledge in how best to interact with an Elixir
Mix project. (And hopefully somebody can set me straight on this.) We’ll start by copying the
test_query project developed in the earlier post into a new
test_ipynb project so that we can make any adjustments necessary.
2. Setting up the environment
IElixir uses the concept of virtual environments for managing packages. It uses
Boyle as its package manager.
Boyle to install the
SPARQL.Client.ex package and then check the modules with the
3. Running basic SPARQL.Client queries
Before going further we’ll just try out a basic SPARQL query to retrieve one RDF triple. And this indicates that there is a problem for this service using the default POST method with URL escaping.
Instead, we need to call the
SPARQL.Client.query/3 function with the GET method. This time around it works fine. And if we run it again and capture the function return we can inspect the SPARQL query result set.
Well, OK, that works now. So let’s try a more meaningful query.
4. Installing our TestQuery modules
Let’s first load our
TestQuery module after confirming with the
IEx.Helpers.exports/1 function that none of its functions are available. We use the
IEx.Helpers.import_file/1 to directly load the module.
This fails and if we look into the source in
test_query.exwe can see that the call
:code.priv_dir(:test_query) is failing since the application
test_query is unknown. (See the caveats raised earlier.)
Replacing that with the relative path to the
priv/ directory in our
test_ipynb project we can try importing again. And this time it works as confirmed by checking the exported functions.
The same thing happens when we attempt to import the
TestQuery.Client module. This time the module attribute
@query_dir needs to be replaced.
Replacing that with an absolute path to the
priv/queries/ directory in our
test_ipynb project we can try importing again. And again this time it succeeds as confirmed by checking the exported functions.
Now we’ve successfully imported our modules but as we noted earlier we are going to have to update our query functions to use a GET method with the right SPARQL protocol. We’ll do this by introducing a new module attribute
@query_opts which sets the keywords
:request_method and :
protocol_version, and we’ll add this module attribute to all our
We’re good to go.
5. Testing it out
To test out our SPARQL client functions let’s first import the modules for ease of use. (Strictly we only want the
TestQuery.Client module, but no matter.)
We’ll first try out our
hello/0 function which queries DBpedia for the ‘Hello World’ resource and returns the English language label. This is returned as a list containing an
RDF.ex literal struct but is easily unpacked by accessing the struct
And as that works, we’ll follow up with one of our stored queries and use the
rquery/1 function on that. Again we can return a list containing an
RDF.ex literal struct.
And finally we can try out the
rquery_all/0 demo function which preforms multiple queries for some 2018 Atlantic hurricanes and stores the results in separate ETS tables. These tables can be inspected using the Oberver tool, as described in the earlier post. (In short the
:observer.start call opens up the Observer interface in a standalone window and by selecting the ‘Table Viewer’ tab one can see the ETS tables we created and inspect them by double clicking the respective rows.)
I’ve shown here in this post how we can use the IElixir kernel to run a Jupyter Notebook with Elixir as our language of choice.
Specifically we have taken the
TestQuery module developed in an earlier post and shown how this can be used within a Juyter Notebook context. We are able to run SPARQL queries using the
SPARQL.Client package directly from the notebook and to process the result sets. Moreover, since we are using a notebook we can better experiment with and annotate our findings.
The tooling around Elixir is improving all the time and with its proven support for managing distributed compute Elixir stands out as a most fascinating and intriguing language for building semantic web applications.