# Native graph data in Elixir

## Using libgraph for graph data structures

I’ve previously looked at how we can query graph databases using Elixir. But now if we want to generate graph data structures natively, how can we do that? Well, turns out that there is the excellent `libgraph`

package by Paul Schoenfelder that allows us to do just that.

The `libgraph`

project page provides a rationale for why this package was developed. Basically it addresses a number of shortcomings in the earlier Erlang module `digraph`

, both as regards performance and extensibility.

We’re going to give a quick introduction here to `libgraph`

, talk about serializing and visualizing `libgraph`

graphs, import a `libgraph`

graph from CSV data, explore some graph structures, and finally do some naive conversions from graph database formats (both property and RDF graphs) into native `libgraph`

formats. That’s quite some agenda, so let’s get started.

## 1. Project setup

Without further ado we’re going to create ourselves a new project `TestMatch`

with the same overall project structure as in the previous post. (See the Graph to graph with Elixir post for details.)

The main differences are as follows:

- We have added a module
`TestMatch.Import`

in`test_match/import.ex`

. - We have added a module
`TestMatch.Lib`

in`test_match/libgraph.ex`

. - We have introduced a dependencies on
`csv`

and`libgraph`

. - We have moved the
`NeoSemantics`

modules out into their own project`neo_semantics`

and now use a`:dep_from_git`

dependency in the`mix.exs`

file.

defp deps do

[

{:ex_doc, "~> 0.20", only: :dev, runtime: false}, # data import

{:csv, "~> 2.3"}, # library graphs

{:libgraph, "~> 0.13"}, # property graphs

{:bolt_sips, "~> 1.5"},

{:dep_from_git, git: "https://github.com/tonyhammond/neo_semantics.git", tag: "0.1.2", app: false}, # rdf graphs

{:sparql_client, "~> 0.2"},

{:hackney, "~> 1.15"}

]

end

So here’s the project layout now – with new files `import.ex`

and `libgraph.ex`

:

% tree lib/

lib/

├── test_match

│ ├── application.ex

│ ├── graph.ex

│ ├── import.ex

│ ├── libgraph.ex

│ ├── lpg

│ │ └── cypher

│ │ └── client.ex

│ ├── lpg.ex

│ ├── query.ex

│ ├── rdf

│ │ └── sparql

│ │ └── client.ex

│ ├── rdf.ex

│ └── utils.ex

└── test_match.ex5 directories, 11 files

We have also introduced into the `TestMatch.Graph`

module a new directory for saving our `libgraph`

graphs

`@lib_dir @priv_dir <> "/lib"`

And we have updated the `TestMatch.Graph`

constructor function to take a new `graph_type`

option `:lib`

for `libgraph`

:

def new(graph_data, graph_file, graph_type) do graphs_dir =

case graph_type do

:lib -> @lib_dir <> "/graphs/"

:lpg -> @lpg_dir <> "/graphs/"

:rdf -> @rdf_dir <> "/graphs/"

_ -> raise "! Unknown graph_type: " <> graph_type

end ...end

And here’s the `priv/`

tree now:

% tree -d priv

priv

├── csv

├── lib

│ └── graphs

│ └── images

├── lpg

│ ├── graphgists

│ ├── graphs

│ └── queries

└── rdf

├── graphs

└── queries11 directories

Note that we have an `images/`

directory under `lib/graphs/`

for storing image renditions. (And we also snuck in a `csv/`

directory which we will be making use of later.)

Finally, a quick word on graphs. We risk creating some confusion here. We now have the following `Graph`

modules in play:

`Graph # a native graph data structure`

RDF.Graph # a set of RDF triples with an optional name

TestMatch.Graph # a struct for serialized graph data access

We will always make clear which kind of graph we are working with at any time and tend to use letters for `Graph`

*structures*, e.g. `g`

, and words for `TestMatch.Graph`

*serializations*, e.g. `graph`

. (The `RDF.Graph`

datasets will typically be invoked though wrapper functions.)

## 2. Graph basics

So, let’s get started. And an apology up front if I tend to use the term ‘vertexes’ here rather than ‘vertices’. Just seems to roll off more naturally. Watch out too for spellings in `libgraph`

. Sometimes the AmE form is used (‘neighbors’) and sometimes the BrE form is used (‘labelled’).

We’ll create a new graph and inspect it.

iex> g = Graph.new#Graph<type: directed, vertices: [], edges: []>iex> Graph.info(g)%{num_edges: 0, num_vertices: 0, size_in_bytes: 312, type: :directed}

So let’s add a couple nodes, `:a`

and `:b`

say:

iex> g = Graph.add_vertex(g, :a)#Graph<type: directed, vertices: [:a], edges: []>iex> g = Graph.add_vertex(g, :b)#Graph<type: directed, vertices: [:a, :b], edges: []>

Or we can do this in one line by piping as:

`iex> g = Graph.new |> Graph.add_vertices([:a, :b])`

**#Graph<type: directed, vertices: [:a, :b], edges: []>**

And we can also add labels (`:foo`

, `:bar`

, `:baz`

) to the vertexes again by piping:

iex> g = (g |> Graph.label_vertex(:a, [:foo]) |> Graph.label_vertex(:b, [:bar, :baz]))#Graph<type: directed, vertices: [:a, :b], edges: []>iex> g |> Graph.vertex_labels(:a)[:foo]iex> g |> Graph.vertex_labels(:b)[:bar, :baz]

By the way note that vertex labels are not shown in graph string forms, even if we can access them with the `vertex_labels/2`

function. But we can still see them if we want by inspecting the `%Graph{}`

struct as a map:

`iex> IO.inspect g, structs: false`

**%{**

__struct__: Graph,

edges: %{},

in_edges: %{},

out_edges: %{},

type: :directed,

vertex_labels: %{97 => ["foo"], 98 => ["bar", "baz"]},

vertices: %{97 => :a, 98 => :b}

}

#Graph<type: directed, vertices: [:a, :b], edges: []>

And we’ll create a labeled edge (`:XXX`

) between those two nodes:

iex> g = Graph.add_edge(g, :a, :b, label: :XXX)#Graph<type: directed, vertices: [:a, :b], edges: [:a -[XXX]-> :b]>iex> Graph.info(g)%{num_edges: 1, num_vertices: 2, size_in_bytes: 800, type: :directed}

And let’s just look at at that `%Graph{}`

struct again:

`iex> IO.inspect g, structs: false`

**%{**

__struct__: Graph,

edges: %{{97, 98} => %{XXX: 1}},

in_edges: %{98 => %{__struct__: MapSet, map: %{97 => []}, version: 2}},

out_edges: %{97 => %{__struct__: MapSet, map: %{98 => []}, version: 2}},

type: :directed,

vertex_labels: %{97 => [:foo], 98 => [:bar, :baz]},

vertices: %{97 => :a, 98 => :b}

}

#Graph<type: directed, vertices: [:a, :b], edges: [:a -[XXX]-> :b]>

Now we can get at the vertexes and edges:

iex> Graph.vertices(g)[:a, :b]iex> Graph.edges(g)[%Graph.Edge{label: :XXX, v1: :a, v2: :b, weight: 1}]

We can also get counts of the vertexes and edges:

iex> Graph.num_vertices(g)2iex> Graph.num_edges(g)1

There’s a whole bunch more to the `Graph`

module. For a quick overview use the `help/1`

function, and then inspect a given function with the `h/1`

IEx helper.

iex> help Graph[iex> h Graph.edge

__struct__: 0,

__struct__: 1,

a_star: 4,

add_edge: 2,

...

update_edge: 4,

update_labelled_edge: 5,

vertex_labels: 2,

vertices: 1

]

:okdef edge(g, v1, v2)@spec edge(t(), vertex(), vertex()) :: Graph.Edge.t() | nilGet an Edge struct for a specific vertex pair, or vertex pair + label.## Example

...

## 3. Visualizing graphs with Graphviz

The `libgraph`

package includes two serialization functions: `Graph.to_dot/1`

and `Graph.to_edgelist/1`

. The former renders the graph using the `DOT`

format from the Graphviz distribution, while the latter is a plaintext rendering of graph edges. The `DOT`

format provides a text-based language specification for laying out graphs. We shall be mainly working with the `DOT`

format as our serialization.

Graphviz is an open-source graph visualization software developed by AT&T Labs which is distributed with a number of layout programs. Graphviz also includes an (older) `Libgraph`

library (not to be confused with the Elixir `libgraph`

package).

Libgraph embodies a common attributed graph data language for graph manipulation tools. … (The Libgraph language is conventionally known as the Dot format, after its best-known application.)

– Emden R. Gansner and Stephen C. North

The layout tools that can read the `DOT`

format include the eponymous `dot`

program, as well as `neato`

and a bunch of others – see the documentation for further information.

The `write_lib_graph/2`

function uses `Graph.to_dot/1`

to render a graph in `DOT`

format. We can read this back using the `read_lib_graph/1`

function and pipe that into a `write_to_png/2`

function to convert the `DOT`

format using one of the Graphviz layout tools. By default the `dot`

tool is used, but other layout tools can be selected.

Here we go with the default layout tool:

iex> list_lib_graphs

["tony.dot", "foo.dot", "foo1.dot"]iex> read_lib_graph("tony.dot") |> write_to_png

{"", 0}

Or we can explicitly select the `dot`

tool with the atom `:dot`

:

`iex> read_dot_graph("tony.dot") |> write_to_png(:dot)`

{"", 0}

This yields the following image:

We can choose instead the `neato`

tool with the atom `:neato`

:

`iex> read_dot_graph("tony.dot") |> write_to_png(:neato)`

{"", 0}

This yields the following layout:

The differences are due to `dot`

being a directed graph layout tool, and `neato`

an undirected graph layout tool. For more information on the different layout tools, see here.

Now, other drawing tools can also import `DOT`

format files. We shall also be using OmniGraffle for more control over the presentation form.

The diagram below shows the graph serialization functions available in our `TestMatch`

project.

## 4. Importing graphs from CSV data

We’re now going to look at importing a graph model from CSV files. For a worked example we’ll use the software dependency graph from the community detection algorithms chapter of this book:

“

Graph Algorithmsby Amy E. Hodler and Mark Needham (O’Reilly). Copyright 2019 Amy E. Hodler and Mark Needham, 978-1-492-05781-9.”

You can register for a free copy of the ebook here. Project resources are available here. (The CSV files for the the software dependency graph have been copied into this project for convenience.)

A graph is a mathematical structure comprising a vertex set and an edge set, with a defined relation between the two sets. So, graph data is commonly loaded via two separate files, one for the nodes which lists node IDs, and one for the relationships which lists source and destination node IDs for the edges.

Let’s set up a new `TestMatch.Import`

module and copy the example files into our `priv_dir()`

under a new `csv/`

folder:

defmodule TestMatch.Import do @priv_dir "#{:code.priv_dir(:test_match)}" @csv_dir @priv_dir <> "/csv/" ...end

We use the `CSV`

package for parsing the CSV files. And this also requires us to add the dependency to our `mix.exs`

file:

`# data import`

{:csv, "~> 2.1"},

We’ll now define a function for adding vertexes to a graph from a nodes file, defaulting to the example `@nodes_file`

:

`def read_nodes_file(g, nodes_file) do`

File.stream!(@csv_dir <> nodes_file)

|> CSV.decode(separator: ?,, headers: true)

|> Enum.reduce(

g,

fn row, g ->

{:ok, %{"id" => id}} = row

Graph.add_vertex(g, id)

end

)

end

Likewise we define a corresponding function for adding edges to a graph from a relationships file, from a nodes file, defaulting to the example `@relationships_file`

:

`def read_relationships_file(g, relationships_file) do`

File.stream!(@csv_dir <> relationships_file)

|> CSV.decode(separator: ?,, headers: true)

|> Enum.reduce(

g,

fn row, g ->

{:ok, %{"dst" => dst, "relationship" => relationship, "src" => src}} = row

Graph.add_edge(g, src, dst, label: relationship)

end

)

end

Now that we’ve got those data import functions in place we can define a new graph as:

`iex> g = Graph.new()`

**#Graph<type: directed, vertices: [], edges: []>**

And let’s import our `TestMatch.Import`

functions:

`iex> import TestMatch.Import`

**TestMatch.Import**

We can now add vertexes to our graph by reading the nodes file as:

`iex> g = read_nodes_file(g, "sw-nodes.csv")`

**#Graph<type: directed, vertices: ["py4j", "jpy-client", "matplotlib", "nbconvert", "python-dateutil", "six", "jpy-console", "pyspark", "pytz", "spacy", "jupyter", "pandas", "ipykernel", "jpy-core", "numpy"], edges: []>**

And we can add edges to our graph by reading the relationships file as:

`iex> g = read_relationships_file(g, "sw-relationships.csv")`

**#Graph<type: directed, vertices: ["py4j", "jpy-client", "matplotlib", "nbconvert", "python-dateutil", "six", "jpy-console", "pyspark", "pytz", "spacy", "jupyter", "pandas", "ipykernel", "jpy-core", "numpy"], edges: ["jpy-client" -[DEPENDS_ON]-> "jpy-core", "matplotlib" -[DEPENDS_ON]-> "python-dateutil", "matplotlib" -[DEPENDS_ON]-> "six", "matplotlib" -[DEPENDS_ON]-> "pytz", "matplotlib" -[DEPENDS_ON]-> "numpy", ...]>**

So, we’re done. We have a populated graph from CSV data.

Next step is to save a serialization of this graph. And this means to save in `DOT`

format:

`iex> g |> write_lib_graph("sw.dot")`

And we can view this graph by converting to a `.png`

file as:

`iex> read_lib_graph("sw.dot") |> to_png()`

This creates an `sw.png`

file in the `lib`

graph images directory which looks like this:

Alternatively we can open the `sw.dot`

file directly in a drawing package like OmniGraffle and prettify the graph as:

## 5. Exploring graph structures

We’ll look at some of the `libgraph`

functions for exploring graphs.

And we’ll use the small graph we just imported above from CSV. This will be helpful because of the small size and because we have a simple visualization of the complete graph.

iex> g = Graph.new |> read_nodes_file("sw-nodes.csv") |> read_relationships_file("sw-relationships.csv")#Graph<type: directed, vertices: ["py4j", "jpy-client", "matplotlib", "nbconvert", "python-dateutil", "six", "jpy-console", "pyspark", "pytz", "spacy", "jupyter", "pandas", "ipykernel", "jpy-core", "numpy"], edges: ["jpy-client" -[DEPENDS_ON]-> "jpy-core", "matplotlib" -[DEPENDS_ON]-> "python-dateutil", "matplotlib" -[DEPENDS_ON]-> "six", "matplotlib" -[DEPENDS_ON]-> "pytz", "matplotlib" -[DEPENDS_ON]-> "numpy"...]>iex> Graph.info(g)%{num_edges: 18, num_vertices: 15, size_in_bytes: 6968, type: :directed}

OK, so we have a small graph with 15 modes and 18 edges.

What are the neighbours of the node `"numpy"`

, say?

`iex> g |> Graph.neighbors("numpy")`

**["matplotlib", "spacy", "pandas"]**

What are the out-edges from the node `"numpy"`

?

`iex> g |> Graph.out_edges("numpy")`

**[]**

So, none. As confirmed by the out-degree:

`iex> g |> Graph.out_degree("numpy")`

**0**

And what are the in-edges to the node `"numpy"`

, then?

`iex> g |> Graph.in_edges("numpy")`

**[**

%Graph.Edge{label: "DEPENDS_ON", v1: "matplotlib", v2: "numpy", weight: 1},

%Graph.Edge{label: "DEPENDS_ON", v1: "spacy", v2: "numpy", weight: 1},

%Graph.Edge{label: "DEPENDS_ON", v1: "pandas", v2: "numpy", weight: 1}

]

So, three. As confirmed by the in-degree:

`iex> g |> Graph.in_degree("numpy")`

**3**

Let’s look at some paths. How many paths between the node `"matplotlib"`

and the node `"six"`

:

`iex> g |> Graph.get_paths("matplotlib", "six")`

**[["matplotlib", "python-dateutil", "six"], ["matplotlib", "six"]]**

Well, that’s two. And of course the shortest path (using Dijkstra’s algorithm) is:

`iex> g |> Graph.get_shortest_path("matplotlib", "six")`

**["matplotlib", "six"]**

And there are no paths between the node `"matplotlib"`

and the node `"six"`

with the edges reversed:

`iex> g |> Graph.transpose |> Graph.get_paths("matplotlib", "six")`

[]

What else? How about we look at the overall structure?

`iex> g |> Graph.components`

**[**

["six", "spacy", "numpy", "pytz", "pandas", "python-dateutil", "matplotlib"],

["ipykernel", "jpy-core", "nbconvert", "jupyter", "jpy-console", "jpy-client"],

["pyspark", "py4j"]

]

So, `libgraph`

clearly identifies the three islands we see in the graph image.

And we can extract one of those islands from graph `g`

as subgraph `g1`

:

iex> nodes = (g |> Graph.components |> List.first)["six", "spacy", "numpy", "pytz", "pandas", "python-dateutil", "matplotlib"]iex> g1 = (g |> Graph.subgraph(nodes))#Graph<type: directed, vertices: ["matplotlib", "python-dateutil", "six", "pytz", "spacy", "pandas", "numpy"], edges: ["matplotlib" -[DEPENDS_ON]-> "python-dateutil", "matplotlib" -[DEPENDS_ON]-> "six", "matplotlib" -[DEPENDS_ON]-> "pytz", "matplotlib" -[DEPENDS_ON]-> "numpy", "python-dateutil" -[DEPENDS_ON]-> "six", "spacy" -[DEPENDS_ON]-> "six", "spacy" -[DEPENDS_ON]-> "numpy", "pandas" -[DEPENDS_ON]-> "python-dateutil", "pandas" -[DEPENDS_ON]-> "pytz", "pandas" -[DEPENDS_ON]-> "numpy"]>

And we can compare graph and subgraph:

iex> g |> Graph.info%{num_edges: 18, num_vertices: 15, size_in_bytes: 6968, type: :directed}iex> g1 |> Graph.info%{num_edges: 10, num_vertices: 7, size_in_bytes: 3576, type: :directed}

We can also ask some basic graph questions, such as:

iex> g |> Graph.is_tree?falseiex> g |> Graph.is_arborescence?falseiex> g |> Graph.is_acyclic?true

There’s more. But that should at least give some idea of what can be done.

## 6. Taxonomy of graph types

At this point I thought it would be helpful to include this wonderful taxonomy of graph types from the ‘Constructions from Dots and Lines’ paper by Marko Rodriguez and Peter Neubauer.

It gives a sense of how simple graphs are related to directed graphs, property graphs, RDF graphs, etc. Previously we looked at a labeled, directed graph.

We’ll now go on to see how we might begin to import property graphs and RDF graphs into a native graph format.

## 7. Graph conversions – from Cypher

We’re going to look at some naive graph conversions.

*a. Cypher – basic (without properties)*We’ll first flush our database and load it with the Movies graph.

iex> Cypher_Client.clear%{stats: %{"nodes-deleted" => 171, "relationships-deleted" => 253}, type: "w"}iex(161)> cypher! lpg_movies().data%{

stats: %{

"labels-added" => 171,

"nodes-created" => 171,

"properties-set" => 564,

"relationships-created" => 253

},

type: "w"

}

Now let’s recall what a Cypher query returns for a relationship query.

`iex> cypher! "match (n)-[r]->(o) return n,r,o limit 1"`

**[**

%{

"n" => %Bolt.Sips.Types.Node{

id: 89333,

labels: ["Person"],

properties: %{"born" => 1965, "name" => "Lana Wachowski"}

},

"o" => %Bolt.Sips.Types.Node{

id: 89327,

labels: ["Movie"],

properties: %{

"released" => 1999,

"tagline" => "Welcome to the Real World",

"title" => "The Matrix"

}

},

"r" => %Bolt.Sips.Types.Relationship{

end: 89327,

id: 224476,

properties: %{},

start: 89333,

type: "DIRECTED"

}

}

]

Here we’re asking for a single relationship with the `limit = 1`

restriction. The result is a `Person`

who `DIRECTED`

a `Movie`

, where `Person`

and `Movie`

details are recorded in their respective `properties`

fields. Both nodes and the relationship between them have unique database IDs.

So, we can add a function `from_cypher/1`

in our `TestMatch.Lib`

module to map a number of relationships as:

def from_cypher(cypher_query \\ cypher_query()) do alias Bolt.Sips.Types.{Node, Relationship} g = Graph.new() results = TestMatch.cypher!(cypher_query) results |> Enum.reduce(

g,

fn result, g -> # match nodes

%Node{

id: n,

labels: _nl,

properties: _np

} = result["n"] %Node{

id: o,

labels: _ol,

properties: _op

} = result["o"] # match relationship

%Relationship{

end: re,

id: _r,

properties: _ro,

start: rs,

type: rl

} = result["r"] # build graph

g

|> Graph.add_vertex(

# String.to_atom(Integer.to_string(n)), _nl

String.to_atom(Integer.to_string(n))

)

|> Graph.add_vertex(

# String.to_atom(Integer.to_string(n)), _ol

String.to_atom(Integer.to_string(o))

)

|> Graph.add_edge(

String.to_atom(Integer.to_string(rs)),

String.to_atom(Integer.to_string(re)),

label: String.to_atom(rl)

) end

)end

This function takes a query as argument (or uses a default query). It creates a new graph `g`

and executes the query saving the Cypher result set into the variable `results`

. It then uses the `reduce/3`

function from the `Enum`

module to process our enumerable (here the `results`

list) with an accumulator. Each `result`

in the `results`

list is the result map for one relationship. The accumulator is our graph `g`

which will be successively augmented with each new result.

We use pattern matching on the `%Node{}`

and `%Relationship{}`

structs from `Bolt.Sips.Types`

to parse out the releavant fields.

We can then build up the graph by adding the two vertexes and the edge between them. We convert database IDs from integers to atoms. And we add a label to the edge.

Let’s try this using the toplevel delegate `lib_graph_from_cypher/1`

:

`iex> g = lib_graph_from_cypher "match (n)-[r]->(o) return n,r,o limit 1"`

**#Graph<type: directed, vertices: [:"89327", :"89333"], edges: [:"89333" -[DIRECTED]-> :"89327"]>**

Well, we’ve got a graph with both vertexes and the labeled edge.

And, of course, one major critique of the above approach is that we haven’t accounted for query variable names. These are hardwired as `n`

, `r`

, and `o`

.

But we’re missing some things. We’re missing vertex labels. We’re missing the edge ID. And especially we’re missing properties.

Let’s talk about vertex labels. In the bolded lines in the fragment below we see that we could easily have added node labels using the `add_vertex/3`

function instead of `add_vertex/2`

(and removing the underscore from the `_nl `

variable):

... # match nodes

%Node{

id: n,

labels: _nl,

properties: _np

} = result["n"]

... # build graph

g

|> Graph.add_vertex(

# String.to_atom(Integer.to_string(n)), _nl

String.to_atom(Integer.to_string(n))

)

...

So, why didn’t we?

It seems we have a problem with vertex labels in the `DOT`

serialization. See the documentation for `add_vertex/3:`

“You can provide optional labels for the vertex, aside from the variety of uses this has for working with graphs, labels will also be used when exporting a graph in DOT format.”

What this means in practice is that nodes get to be named with labels rather than attributed with labels. So property graph node labels which typically function as category labels cannot be reliably used as node identifiers.

Let’s see this. We’ll get three relationships from our Movies graph:

`iex> g = lib_graph_from_cypher "match (n)-[r]->(o) return n,r,o limit 3"`

**#Graph<type: directed, vertices: [:"67783", :"67784", :"67788", :"67791"], edges: [:"67784" -[ACTED_IN]-> :"67783", :"67788" -[DIRECTED]-> :"67783", :"67791" -[ACTED_IN]-> :"67783"]>**

And we can serialize to this to a `DOT`

string:

`iex> g |> Graph.to_dot |> Tuple.to_list |> List.last |> IO.puts`

strict digraph {

"67783"

"67784"

"67788"

"67791"

"67784" -> "67783" [label="ACTED_IN", weight=1]

"67788" -> "67783" [label="DIRECTED", weight=1]

"67791" -> "67783" [label="ACTED_IN", weight=1]

}

Now, if we had added in node labels we would have got this:

`iex> g |> Graph.to_dot |> Tuple.to_list |> List.last |> IO.puts`

strict digraph {

"Movie"

"Person"

"Person"

"Person"

"Person" -> "Movie" [label="ACTED_IN", weight=1]

"Person" -> "Movie" [label="DIRECTED", weight=1]

"Person" -> "Movie" [label="ACTED_IN", weight=1]

}

Confusion.

What we really wanted was something more like this:

`iex> g |> Graph.to_dot |> Tuple.to_list |> List.last |> IO.puts`

strict digraph {

"67783" [label="Movie"]

"67784" [label="Person"]

"67788" [label="Person"]

"67791" [label="Person"]

"67784" -> "67783" [label="ACTED_IN", weight=1]

"67788" -> "67783" [label="DIRECTED", weight=1]

"67791" -> "67783" [label="ACTED_IN", weight=1]

}

But for that we would need to fix the `DOT`

serializer. So that’s why support for the vertex labels have not been added yet.

We’re also not pulling out the edge ID as this isn’t used in the `DOT`

serialization, only the node IDs at either end of the relationship are used. But we will save the edge ID below when we also save node and relationship properties.

*b. Cypher – improved (with properties)*

Now `libgraph`

has no support for vertex or edge properties. So what can we do? Well one solution that occurs is to stow away the properties in ETS tables and to key off the ID.

We had a first look at ETS tables in the earlier post ‘Querying RDF with Elixir’ so won’t spend any time describing the Observer.

The strategy we’re going to follow here is to create one ETS table for nodes and one for relationships.

Let’s create some ETS tables by adding this to the `TestMatch.Lib`

module:

@node_table Module.concat(__MODULE__, "node_properties")

@edge_table Module.concat(__MODULE__, "edge_properties")def create_ets_tables do

:ets.new(@node_table, [:named_table])

:ets.new(@edge_table, [:named_table])

end

And for now we can just call this manually:

`iex> TestMatch.Lib.create_ets_tables()`

**:"Elixir.TestMatch.Lib.edge_properties"**

Of course, it would be better to hook it in somewhere so it’s started automatically by the app.

Now here’s our revised function `from_cypher_with_properties/1`

in the `TestMatch.Lib`

module:

def from_cypher_with_properties(cypher_query \\ cypher_query()) do alias Bolt.Sips.Types.{Node, Relationship} g = Graph.new() results = TestMatch.cypher!(cypher_query) results |> Enum.reduce(

g,

fn result, g -> # match nodes

%Node{

id: n,

labels: _nl,properties: np} = result["n"] %Node{

id: o,

labels: _ol,properties: op} = result["o"] # match relationship

%Relationship{

end: re,id: r,start: rs,

properties: ro,

type: rl

} = result["r"]# store properties in ETS# build graph

:ets.insert(@node_table, {n, np})

:ets.insert(@node_table, {o, op})

:ets.insert(@edge_table, {r, rs, re, rp})

... end

)end

In bold are the changes from the `from_cypher/1`

function. We’ve removed underscores on pattern match variables as we will be using them. And we’ve added in a new section `# store properties in ETS`

, where we write node properties (`np`

and `op`

) to the `@node_table`

and relationship properties (`rp`

) to the `@edge_table`

. We key off the IDs: `n`

, `o`

and `r`

. Additionally we save the relationship start (`rs`

) and end (`re`

) node IDs.

So, let’s try this out using the toplevel delegate `lib_graph_from_cypher_with_properties/1`

:

`iex> g = lib_graph_from_cypher_with_properties "match (n)-[r]->(o) return n,r,o"`

**#Graph<type: directed, num_vertices: 171, num_edges: 253>**

Good. We’ve got our `libgraph`

graph.

And now let’s start the Observer.

`iex> :observer.start`

**:ok**

The Observer will open in a new window. We want to select the `Table Viewer`

tab.

Here’s the `Table Viewer`

tab in the Observer which lists both our ETS tables:

`Elixir.TestMatch.Lib.node_properties`

`Elixir.TestMatch.Lib.edge_properties`

Let’s inspect the `@node_table`

. This is a simple key/value store with the node ID as key and the properties map as value. Double clicking a row will open a viewable copy of the record in an edit window.

Let’s inspect the `@edge_table`

. These records have the edge ID as key and the two node IDs and properties map as values. Again, double clicking a row will open a viewable copy of the record in an edit window.

So we have our native graph structure (albeit without properties) and we have node and edge properties aligned by ID in respective ETS tables. That’s a small step forward in supporting property graphs natively within Elixir.

## 8. Graph conversions – from SPARQL

OK, let’s turn now to RDF graphs.

*a. SPARQL*Let’s also recall what a SPARQL query returns for a relationship query:

`iex> sparql! "select * where {?s ?p ?o} limit 1"`

**%SPARQL.Query.Result{**

results: [

%{

"o" => ~I<http://purl.org/ontology/bibo/Book>,

"p" => ~I<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,

"s" => ~L"urn:isbn:978-1-68050-252-7"

}

],

variables: ["s", "p", "o"]

}

To be honest this is just one type of SPARQL query, although it is by far the most prevalent. There are four types of SPARQL query, two of them (`SELECT`

and `ASK`

) return tabular results using a `%SPARQL.Query.Result{}`

struct, and the other two (`CONSTRUCT`

and `DESCRIBE`

) return RDF data as results using the `%RDF.Graph{}`

struct.

Let’s keep this simple and just restrict to `SELECT`

queries.

We can set up a very simple conversion from RDF as:

def from_sparql(sparql_query \\ sparql_query()) do alias SPARQL.Query.Result g = Graph.new() results =

case TestMatch.sparql!(sparql_query) do

%RDF.Graph{descriptions: _descriptions} ->

raise "! SPARQL 'CONSTRUCT', 'DESCRIBE' queries not supported"

%Result{results: true} ->

raise "! SPARQL 'ASK' queries not supported"

%Result{results: results} -> results

_ -> raise "! Unrecognized result format"

end results |> Enum.reduce(

g,

fn result, g ->

Graph.add_edge(

g,

String.to_atom(result["s"].value),

String.to_atom(result["o"].value),

label: String.to_atom(result["p"].value)

)

end

)end

Here we test the result set and restrict to a `%SPARQL.Query.Result{}`

struct returned from a `SELECT`

query.

We can then build up a set of edges using the `reduce/3`

function from `Enum`

, similar to before.

Applying this to a repo in our GraphDB instance which is loaded with the Books RDF graph we get this with the toplevel delegate `lib_graph_from_sparql/1`

:

`iex> g = lib_graph_from_sparql "select * where {?s ?p ?o}"`

**#Graph<type: directed, vertices: [:Paper, :"Adopting Elixir",**

:"http://purl.org/ontology/bibo/Book", :"https://twitter.com/bgmarx",

:"https://pragprog.com/", :"https://twitter.com/josevalim", :"2018-03-14",

:"https://twitter.com/redrapids",

:"urn:isbn:978-1-68050-252-7"], edges: [:"urn:isbn:978-1-68050-252-7" -[http://purl.org/dc/elements/1.1/format]-> :Paper, :"urn:isbn:978-1-68050-252-7" -[http://purl.org/dc/elements/1.1/title]-> :"Adopting Elixir", :"urn:isbn:978-1-68050-252-7" -[http://www.w3.org/1999/02/22-rdf-syntax-ns#type]-> :"http://purl.org/ontology/bibo/Book", :"urn:isbn:978-1-68050-252-7" -[http://purl.org/dc/elements/1.1/creator]-> :"https://twitter.com/bgmarx", :"urn:isbn:978-1-68050-252-7" -[http://purl.org/dc/elements/1.1/publisher]-> :"https://pragprog.com/", :"urn:isbn:978-1-68050-252-7" -[http://purl.org/dc/elements/1.1/creator]-> :"https://twitter.com/josevalim", :"urn:isbn:978-1-68050-252-7" -[http://purl.org/dc/elements/1.1/date]-> :"2018-03-14", :"urn:isbn:978-1-68050-252-7" -[http://purl.org/dc/elements/1.1/creator]-> :"https://twitter.com/redrapids"]>

And, again, one major critique of the above approach is that we haven’t accounted for query variable names. These are hardwired as `s`

, `p`

, and `o`

.

There’s obviously a whole bunch of improvements one could add. For example, distinguishing RDF datatype properties from object properties. Adding support for dataypes and language tags, etc. Maybe one could even make some limited use of ETS tables in a similar fashion to the property graph conversion we showed before.

Anyway, I could go on but I’ve already gone on long enough.

## Summary

I’ve shown here in this post how we can use Elixir to build native graph data structures using the `libgraph`

package.

After a brief introduction to building graphs using `libgraph`

, I proceeded to look at how to serialize and visualize `libgraph`

graphs, then imported a `libgraph`

graph from CSV data, explored some graph structures, and finally showed some naive conversions from graph database formats (both property and RDF graphs) into native `libgraph`

formats.

So it does begin to look like Elixir can be used to some effect in building and manipulating graph data structures of various hues.

## First post in this series

See here for the project `TestMatch`

code. (And note that the project `TestMatch`

code also includes documentation which can be browsed here.)

This is the eighth in a series of posts. See my previous post ‘Graph to graph with Elixir’.

You can also follow me on Twitter as @tonyhammond.