Native graph data in Elixir

Using libgraph for graph data structures

Tony Hammond
Apr 26 · 19 min read
Photo by Francisco Moreno on Unsplash

I’ve previously looked at how we can query graph databases using Elixir. But now if we want to generate graph data structures natively, how can we do that? Well, turns out that there is the excellent libgraph package by Paul Schoenfelder that allows us to do just that.

The libgraph project page provides a rationale for why this package was developed. Basically it addresses a number of shortcomings in the earlier Erlang module digraph, both as regards performance and extensibility.

We’re going to give a quick introduction here to libgraph, talk about serializing and visualizing libgraph graphs, import a libgraph graph from CSV data, explore some graph structures, and finally do some naive conversions from graph database formats (both property and RDF graphs) into native libgraph formats. That’s quite some agenda, so let’s get started.

Without further ado we’re going to create ourselves a new project TestMatch with the same overall project structure as in the previous post. (See the Graph to graph with Elixir post for details.)

The main differences are as follows:

  1. We have added a module TestMatch.Import in test_match/import.ex.
  2. We have added a module TestMatch.Lib in test_match/libgraph.ex.
  3. We have introduced a dependencies on csv and libgraph.
  4. We have moved the NeoSemantics modules out into their own project neo_semantics and now use a :dep_from_git dependency in the mix.exs file.

So here’s the project layout now – with new files import.ex and libgraph.ex:

We have also introduced into the TestMatch.Graph module a new directory for saving our libgraph graphs

And we have updated the TestMatch.Graph constructor function to take a new graph_type option :lib for libgraph:

And here’s the priv/ tree now:

Note that we have an images/ directory under lib/graphs/ for storing image renditions. (And we also snuck in a csv/ directory which we will be making use of later.)

Finally, a quick word on graphs. We risk creating some confusion here. We now have the following Graph modules in play:

We will always make clear which kind of graph we are working with at any time and tend to use letters for Graph structures, e.g. g, and words for TestMatch.Graph serializations, e.g. graph. (The RDF.Graph datasets will typically be invoked though wrapper functions.)

So, let’s get started. And an apology up front if I tend to use the term ‘vertexes’ here rather than ‘vertices’. Just seems to roll off more naturally. Watch out too for spellings in libgraph. Sometimes the AmE form is used (‘neighbors’) and sometimes the BrE form is used (‘labelled’).

We’ll create a new graph and inspect it.

So let’s add a couple nodes, :a and :b say:

Or we can do this in one line by piping as:

And we can also add labels (:foo, :bar, :baz) to the vertexes again by piping:

By the way note that vertex labels are not shown in graph string forms, even if we can access them with the vertex_labels/2 function. But we can still see them if we want by inspecting the %Graph{} struct as a map:

And we’ll create a labeled edge (:XXX) between those two nodes:

And let’s just look at at that %Graph{} struct again:

Now we can get at the vertexes and edges:

We can also get counts of the vertexes and edges:

There’s a whole bunch more to the Graph module. For a quick overview use the help/1 function, and then inspect a given function with the h/1 IEx helper.

The libgraph package includes two serialization functions: Graph.to_dot/1 and Graph.to_edgelist/1. The former renders the graph using the DOT format from the Graphviz distribution, while the latter is a plaintext rendering of graph edges. The DOT format provides a text-based language specification for laying out graphs. We shall be mainly working with the DOT format as our serialization.

Graphviz is an open-source graph visualization software developed by AT&T Labs which is distributed with a number of layout programs. Graphviz also includes an (older) Libgraph library (not to be confused with the Elixir libgraph package).

Libgraph embodies a common attributed graph data language for graph manipulation tools. … (The Libgraph language is conventionally known as the Dot format, after its best-known application.)
Emden R. Gansner and Stephen C. North

The layout tools that can read the DOT format include the eponymous dot program, as well as neato and a bunch of others – see the documentation for further information.

The write_lib_graph/2 function uses Graph.to_dot/1 to render a graph in DOT format. We can read this back using the read_lib_graph/1 function and pipe that into a write_to_png/2 function to convert the DOT format using one of the Graphviz layout tools. By default the dot tool is used, but other layout tools can be selected.

Here we go with the default layout tool:

Or we can explicitly select the dot tool with the atom :dot:

This yields the following image:

We can choose instead the neato tool with the atom :neato:

This yields the following layout:

The differences are due to dot being a directed graph layout tool, and neato an undirected graph layout tool. For more information on the different layout tools, see here.

Now, other drawing tools can also import DOT format files. We shall also be using OmniGraffle for more control over the presentation form.

The diagram below shows the graph serialization functions available in our TestMatch project.

Elixir functions used by the :test_match app to serialize 'libgraph’ graphs.

We’re now going to look at importing a graph model from CSV files. For a worked example we’ll use the software dependency graph from the community detection algorithms chapter of this book:

Graph Algorithms by Amy E. Hodler and Mark Needham (O’Reilly). Copyright 2019 Amy E. Hodler and Mark Needham, 978-1-492-05781-9.”

You can register for a free copy of the ebook here. Project resources are available here. (The CSV files for the the software dependency graph have been copied into this project for convenience.)

A graph is a mathematical structure comprising a vertex set and an edge set, with a defined relation between the two sets. So, graph data is commonly loaded via two separate files, one for the nodes which lists node IDs, and one for the relationships which lists source and destination node IDs for the edges.

Let’s set up a new TestMatch.Import module and copy the example files into our priv_dir() under a new csv/ folder:

We use the CSV package for parsing the CSV files. And this also requires us to add the dependency to our mix.exs file:

We’ll now define a function for adding vertexes to a graph from a nodes file, defaulting to the example @nodes_file:

Likewise we define a corresponding function for adding edges to a graph from a relationships file, from a nodes file, defaulting to the example @relationships_file:

Now that we’ve got those data import functions in place we can define a new graph as:

And let’s import our TestMatch.Import functions:

We can now add vertexes to our graph by reading the nodes file as:

And we can add edges to our graph by reading the relationships file as:

So, we’re done. We have a populated graph from CSV data.

Next step is to save a serialization of this graph. And this means to save in DOT format:

And we can view this graph by converting to a .png file as:

This creates an sw.png file in the lib graph images directory which looks like this:

Basic graph image rendering using ‘dot’.

Alternatively we can open the file directly in a drawing package like OmniGraffle and prettify the graph as:

Elaborated graph image rendering using OmniGraffle.

We’ll look at some of the libgraph functions for exploring graphs.

And we’ll use the small graph we just imported above from CSV. This will be helpful because of the small size and because we have a simple visualization of the complete graph.

OK, so we have a small graph with 15 modes and 18 edges.

What are the neighbours of the node "numpy", say?

What are the out-edges from the node "numpy"?

So, none. As confirmed by the out-degree:

And what are the in-edges to the node "numpy", then?

So, three. As confirmed by the in-degree:

Let’s look at some paths. How many paths between the node "matplotlib" and the node "six":

Well, that’s two. And of course the shortest path (using Dijkstra’s algorithm) is:

And there are no paths between the node "matplotlib" and the node "six" with the edges reversed:

What else? How about we look at the overall structure?

So, libgraph clearly identifies the three islands we see in the graph image.

And we can extract one of those islands from graph g as subgraph g1:

And we can compare graph and subgraph:

We can also ask some basic graph questions, such as:

There’s more. But that should at least give some idea of what can be done.

At this point I thought it would be helpful to include this wonderful taxonomy of graph types from the ‘Constructions from Dots and Lines’ paper by Marko Rodriguez and Peter Neubauer.

It gives a sense of how simple graphs are related to directed graphs, property graphs, RDF graphs, etc. Previously we looked at a labeled, directed graph.

The various graph types and the morphisms that yield one graph type from another. (Reproduced from Rodriguez, M.A., Neubauer, P., “Constructions from Dots and Lines,” Bulletin of the American Society for Information Science and Technology, volume 36, number 6, pages 35–41, ISSN:1550–8366, 2010.)

We’ll now go on to see how we might begin to import property graphs and RDF graphs into a native graph format.

We’re going to look at some naive graph conversions.

a. Cypher – basic (without properties)
We’ll first flush our database and load it with the Movies graph.

Now let’s recall what a Cypher query returns for a relationship query.

Here we’re asking for a single relationship with the limit = 1 restriction. The result is a Person who DIRECTED a Movie, where Person and Movie details are recorded in their respective properties fields. Both nodes and the relationship between them have unique database IDs.

So, we can add a function from_cypher/1 in our TestMatch.Lib module to map a number of relationships as:

This function takes a query as argument (or uses a default query). It creates a new graph g and executes the query saving the Cypher result set into the variable results. It then uses the reduce/3 function from the Enum module to process our enumerable (here the results list) with an accumulator. Each result in the results list is the result map for one relationship. The accumulator is our graph g which will be successively augmented with each new result.

We use pattern matching on the %Node{} and %Relationship{} structs from Bolt.Sips.Types to parse out the releavant fields.

We can then build up the graph by adding the two vertexes and the edge between them. We convert database IDs from integers to atoms. And we add a label to the edge.

Let’s try this using the toplevel delegate lib_graph_from_cypher/1:

Well, we’ve got a graph with both vertexes and the labeled edge.

And, of course, one major critique of the above approach is that we haven’t accounted for query variable names. These are hardwired as n, r, and o.

But we’re missing some things. We’re missing vertex labels. We’re missing the edge ID. And especially we’re missing properties.

Let’s talk about vertex labels. In the bolded lines in the fragment below we see that we could easily have added node labels using the add_vertex/3 function instead of add_vertex/2 (and removing the underscore from the _nl variable):

So, why didn’t we?

It seems we have a problem with vertex labels in the DOT serialization. See the documentation for add_vertex/3:

You can provide optional labels for the vertex, aside from the variety of uses this has for working with graphs, labels will also be used when exporting a graph in DOT format.”

What this means in practice is that nodes get to be named with labels rather than attributed with labels. So property graph node labels which typically function as category labels cannot be reliably used as node identifiers.

Let’s see this. We’ll get three relationships from our Movies graph:

And we can serialize to this to a DOT string:

Now, if we had added in node labels we would have got this:


What we really wanted was something more like this:

But for that we would need to fix the DOT serializer. So that’s why support for the vertex labels have not been added yet.

We’re also not pulling out the edge ID as this isn’t used in the DOT serialization, only the node IDs at either end of the relationship are used. But we will save the edge ID below when we also save node and relationship properties.

b. Cypher – improved (with properties)

Now libgraph has no support for vertex or edge properties. So what can we do? Well one solution that occurs is to stow away the properties in ETS tables and to key off the ID.

We had a first look at ETS tables in the earlier post ‘Querying RDF with Elixir’ so won’t spend any time describing the Observer.

The strategy we’re going to follow here is to create one ETS table for nodes and one for relationships.

Let’s create some ETS tables by adding this to the TestMatch.Lib module:

And for now we can just call this manually:

Of course, it would be better to hook it in somewhere so it’s started automatically by the app.

Now here’s our revised function from_cypher_with_properties/1 in the TestMatch.Lib module:

In bold are the changes from the from_cypher/1 function. We’ve removed underscores on pattern match variables as we will be using them. And we’ve added in a new section # store properties in ETS, where we write node properties (np and op) to the @node_table and relationship properties (rp) to the @edge_table. We key off the IDs: n, o and r. Additionally we save the relationship start (rs) and end (re) node IDs.

So, let’s try this out using the toplevel delegate lib_graph_from_cypher_with_properties/1:

Good. We’ve got our libgraph graph.

And now let’s start the Observer.

The Observer will open in a new window. We want to select the Table Viewer tab.

Here’s the Table Viewer tab in the Observer which lists both our ETS tables:

  • Elixir.TestMatch.Lib.node_properties
  • Elixir.TestMatch.Lib.edge_properties
Window showing Table Viewer tab listing ETS node and edge property tables.

Let’s inspect the @node_table. This is a simple key/value store with the node ID as key and the properties map as value. Double clicking a row will open a viewable copy of the record in an edit window.

Window showing ETS table for node properties.

Let’s inspect the @edge_table. These records have the edge ID as key and the two node IDs and properties map as values. Again, double clicking a row will open a viewable copy of the record in an edit window.

Window showing ETS table for edge properties.

So we have our native graph structure (albeit without properties) and we have node and edge properties aligned by ID in respective ETS tables. That’s a small step forward in supporting property graphs natively within Elixir.

OK, let’s turn now to RDF graphs.

Let’s also recall what a SPARQL query returns for a relationship query:

To be honest this is just one type of SPARQL query, although it is by far the most prevalent. There are four types of SPARQL query, two of them (SELECT and ASK) return tabular results using a %SPARQL.Query.Result{} struct, and the other two (CONSTRUCT and DESCRIBE) return RDF data as results using the %RDF.Graph{} struct.

Let’s keep this simple and just restrict to SELECT queries.

We can set up a very simple conversion from RDF as:

Here we test the result set and restrict to a %SPARQL.Query.Result{} struct returned from a SELECT query.

We can then build up a set of edges using the reduce/3 function from Enum, similar to before.

Applying this to a repo in our GraphDB instance which is loaded with the Books RDF graph we get this with the toplevel delegate lib_graph_from_sparql/1:

And, again, one major critique of the above approach is that we haven’t accounted for query variable names. These are hardwired as s, p, and o.

There’s obviously a whole bunch of improvements one could add. For example, distinguishing RDF datatype properties from object properties. Adding support for dataypes and language tags, etc. Maybe one could even make some limited use of ETS tables in a similar fashion to the property graph conversion we showed before.

Anyway, I could go on but I’ve already gone on long enough.

I’ve shown here in this post how we can use Elixir to build native graph data structures using the libgraph package.

After a brief introduction to building graphs using libgraph, I proceeded to look at how to serialize and visualize libgraph graphs, then imported a libgraph graph from CSV data, explored some graph structures, and finally showed some naive conversions from graph database formats (both property and RDF graphs) into native libgraph formats.

So it does begin to look like Elixir can be used to some effect in building and manipulating graph data structures of various hues.

See here for the project TestMatch code. (And note that the project TestMatch code also includes documentation which can be browsed here.)

This is the eighth in a series of posts. See my previous post ‘Graph to graph with Elixir’.

You can also follow me on Twitter as @tonyhammond.

Tony Hammond

Written by

Distributed data, distributed compute – the graph! | #writing, #workseeking