Transform MongoDB collections automagically into Graphs

Using the `apoc.graph.fromDocument` procedure

Published in

Neo4j Developer Blog

5 min readNov 29, 2019

The apoc.graph.* are a set of procedures that allow transforming custom data formats into Graphs. The last procedure that we created is apoc.graph.fromDocument that allows creating graphs from maps, JSON objects and JSON strings. In this article, we’ll talk about, why you should use it and what are the benefits of using this procedure while you’re dealing with JSON-like formats.

From JSON to Graph?

A JSON is basically a tree-graph

So the following JSON:

{
   "id": 1,
   "type": "artist",
   "name": "Genesis",
   "albums": [{
      "type": "album",
      "id": 1,
      "producer": "Jonathan King",
      "title": "From Genesis to Revelation"
   }]
}

Can be turned into a graph like this:

The Graph representation of the JSON above

which is composed of:

2 nodes (Artist) and (Album)
and 1 relationship ALBUMS between them

And that’s what the apoc.graph.fromDocument does!

How does it work?

The procedure signature is quite simple:

apoc.graph.fromDocument({json},{config})

json, type Object: the JSON that must be transformed. Every entry must have an `id` and a `type` (the label attached to the node), configurable via the config params;
config, type Map: the configuration params.

So given the following JSON as input:

{
 "id": 1,
 "type": "Person",
 "name": "Andrea",
 "sizes": {
  "weight": {
   "value": 70,
   "um": "Kg"
  },
  "height": {
   "value": 174,
   "um": "cm"
  }
 },
 "books": [{
  "title": "Flow My Tears, the Policeman Said",
  "released": 1974
 }, {
  "title": "The man in the High Castle",
  "released": 1962
 }]
}

And the following configuration:

{
    write: false,
    idField: "id",
    mappings: {
      `$`: 'Person:Reader{*,@sizes}',
      `$.books`: 'Book{!title, released}'
    }
}

the first two are quite straightforward:

write, type boolean: persist the graph otherwise return a Virtual Graph, default false
idField, type String: the document field name that will become the id field of the created nodes (used for node resolution when you create relationships between nodes), default id
mappings, type Map: you can use a JSON path like syntax for include properties, defining document properties as value objects (by prepending the @ to the property name) and define custom/composite keys per Labels, by prepending the ! to the property name.

Let’s describe the mappings fields:

`$`: ‘Person:Reader{*,@sizes}’: this means that to the root object will be applied two labels Person and Reader, all properties are included and the size property will be transformed into a value object, this means that it will be flattened, so the node will have properties like: siezes.weight.value , siezes.weight.um and so on…; as you can see no id is specified so we will consider as id the property defined into the idField property;
`$.books`: ‘Book{!title, released}’: this means that at the books property of the root object will be transformed into a node with label Book composed of two properties title considered as id (it’s marked with !) and released; moreover, the property will be connected to the parent node of type Person:Reader via the BOOKS relationship

The output will be the following:

The Graph representation of the provided JSON

There are other config params:

labelField, type String: the field name that became the label of the node, default type
generateId, type boolean: in case of missing id-field value it generates an UUID for it, default true
defaultLabel, type String: in case of missing label-field value is uses the provided default label, default is empty
skipValidation, type boolean: in case you want to skip the validation process into the `apoc.graph.fromDocument` procedure, default false

Why should you use it?

You can leverage this procedure while you are dealing with document-based data, like:

Web-APIs
Document-based databases like Couchbase, MongoDB, ElasticSearch, etc…

So you can turn your JSON data into a graph in a very easy way.

A Real Scenario: Transform MongoDB documents into Graphs

Transform MongoDB JSON document in Graphs into Neo4j

The only prerequisite is to have Docker and Docker Compose. You can download the example from GitHub.

Neo4j just launched Aura its Graph-Database-As-Service which simplifies the deployment and the scaling of your database letting you focus only on your application, you can also use it in combination with MongoDB Atlas in order to test a full-cloud scenario.

In order to spin-up the whole stack you must execute the following command:

$ docker-compose up -d

This will start 3 services:

MongoDB with a Twitter dataset
Mongo Express: a GUI for MongoDB
an empty Neo4j instance

The goal is to leverage the APOC procedures and the JSON tree-graph structure in order to “automatically” transform MongoDB documents into a Graph structure.

The first step is to sample the MongoDB dataset, with the apoc.mongodb.first procedure:

call apoc.mongodb.first('mongodb://mongo:neo4j@mongo:27017', 'test', 'tweets', {}) yield value
return value

The output should be something like this:

If you dive into the document you’ll find some fields like:

text : the tweet content
user : the user that published the tweet

So let's leverage these two fields in order to create a graph like this:

In order to do that we can use the mappings configuration field to extract exactly this structure:

{...
   mappings: {
      `$`: 'Tweet{!id_str,text}',
      `$.user`: 'User{!id_str,screen_name,description}'
   }
...}

Let’s describe the fields:

`$`: ‘Tweet{!id_str,text}’: transforms the root object into the Tweet node, and applies to it two fields id_str (considered as id) and text;
`$.user`: ‘User{!id_str,screen_name,description}’: transforms the user field into the User node, and applies to it three fields id_str (considered as id), screen_name and description.

At this point we can chain the mongo procedure with the apoc.graph.fromDocument procedure in order to import our documents as a graph with the following query:

CALL apoc.mongodb.get('mongodb://mongo:neo4j@mongo:27017', 'test', 'tweets', {}, true) YIELD value
CALL apoc.graph.fromDocument(value, {write: true, skipValidation: true, mappings: {`$`: 'Tweet{!id_str,text}',`$.user`: 'User{!id_str,screen_name,description}'}}) YIELD graph AS g1
return g1

The output should be something like this:

The imported MongoDB collection into Neo4j

Final thoughts

We saw how to leverage te apoc.graph.fromDocument procedure in order to transform any JSON data into a graph, and in particular we saw how to chain this procedure in order to simply import MongoDB collections into Neo4j by automatically transform the document into a graph.

Please try it by yourself by cloning the demo repository and feel free to fill an issue into the APOC repository if you want new features!