Neo4J Graph Data Science with Verifiable Credential Data

Nis Jespersen
Transmute
Published in
3 min readSep 26, 2022

--

Verifiable Credentials are built from Linked Data which can be easily imported into graph databases. Here I’m exploring the potential of combining cryptographically verifiable data with Neo4j’s Graph Data Science algorithms.

These are simple first steps. I made sure to keep a real business case in focus, albeit a simple one, clustering organizations based on the types of VCs they are connected to via standard issuer relationship.

Sample data and commands are available on https://github.com/nissimsan/trust-graphs if you feel like following along.

A very simple sample data set consisting of various types of Verifiable Credentials and Organizations.
Our very simple sample data set.

Above is a picture of our starting point. A knowledge graph, imported from various types of Verifiable Credential (orange) and Organization (blue) nodes.

Between them are issuer relationships as well as several other business relationships: manufacturer, consignee, shipper, etc, all from the credentialSubject. For simplicity, for now we will just focus on what insights we can reap just from the issuer.

The data model: Verifiable Credentials are issued by an Organization.
The data model in focus.

The first thing we must always do when working in Neo4j GDS is to project an in-memory graph for the algorithms to run against. Below is a projection which “carves out” just the issuer relationship.

Projection of VC and Organization nodes and issuer relationships.
Basic projection.

This is a much more focused picture already. It is visually clear that we have removed clutter which is not relevant to our analysis.

However, these nodes no longer have anything topological in common. That’s not good for a graph algorithm. The answer is to merge the VCs by their types: Bill of Lading, Commercial Invoice, etc.

Verifiable Credentials merged by type: Bill of Lading, Commercial Invoice, etc.
Verifiable Credentials merged by type.

Now, on a new projection we can run the Label Propagation graph algorithm. It is a community detection algorithm, which determines communities based on node connection density. Clearly, we have found four types of organizations — communities — in our data.

Four detected Organization communities based on the types of Verifiable Credentials they have issuance relationships with.
Detected organization communities.

By writing this result back to the database, we can also make nice visuals of the same result, for example representing the Organization community clusters by color.

Visualization of Organization communities by color.
Organization communities visualization.

And so we have reached our goal, deriving business meaning from a knowledge graph produced from a set of Verifiable Credentials. We only used a single relationship, and a single one of Neo4j GDS’s catalog of graph algorithms, and on a very small data set. Yet, it was enough for us to detect a meaningful pattern.

--

--