Neo4j Output Transform in Apache Hop

Sam L
6 min readNov 1, 2022

--

__Introduction

Let’s talk about the Apache Hop “Neo4j Output” transform.

The transform can do several different types of operations:

  • create nodes only
  • create relationships only
  • create both nodes and their joining relationships

In this article, I’ll walk through those different types of operations to help clarify the transform.

Before you try to do any of this work yourself, you should, as always, consult the latest documentation (RTFM!).

__Creating Nodes and Relationships all in the same transform step using Neo4j Output

With the goal of creating a simple graph containing “Author” nodes, “Book” nodes, and to have those nodes be connected with the “WROTE” relationship. Here is my simple model:

I’m going to use a single file of data, like this below.

I created a new pipeline and then added a “csv file input” transform onto my pipeline canvas.

> Browse to your file and set the delimiter and other values appropriately based on your file

> Click the “Get Fields” button to populate the fields grid below

Note: row 1 name has some garbage characters at the beginning. I see this sometimes and I believe it’s based on which OS / encoding system was used to create the file.

Feel free to edit the Name to clean it up like below:

> Click Preview

Complete. Now we can go add the Neo4j Output transform

> Connect it to the CSV file input. By the way, it’s recommended to rename the transforms so they are descriptive.

> Fill out the top of the Neo4j Output transform.

> Now fill out the 3 different bottom tabs: From Node, To Node, and Relationship.

I’m going to use the “From Node” tab to create my Author nodes. In the top section I’m typing/hardcoding the “From Label” value as “Author”. In the bottom section, click the “Get Fields” to grab the fields from the input transform.

Be sure to know which field is going to guarantee the unique records and then set the “Primary” field = Y for that field. For me, that is the “personid”.

> I’m going to use the “To Node” tab to create my Book nodes. Fill out the “To Node” section to create “Book” nodes.

> Fill out the “Relationship” section to create the “WROTE” relationships between Authors and Books. I am hardcoding this value, but you can use the “Relationship Field” box to softcode the value based on what is in your data.

You can also populate relationship properties, but I opted to keep this example simple.

That is it. You should be able to run this pipeline.

__Creating Nodes Only using Neo4j Output transform

This example is going to be almost identical to the first example. I can think of the following use case: what if you had the data coming from 3 separate files or if for some reason, you prefer to split out the creation of Node1, Node2 and Relationship into 3 separate steps? To accomplish that you will use three distinct Neo4j Output transform:

  1. create Node1 only
  2. create Node2 only
  3. create Relationship only

My example model will be simple, because it’s always easiest to build and test new functionality by starting simply and adding on as you go. This time our model will contain Persons that have Passports.

My data files look like this:

Create Node1 (“Person”) only

> Use the “From Node” tab to create the Person nodes. I’m hard-coding the node labels to “Person”. I used the “Get Fields” to populate the “From Property” grid. “UniqueId” is selected as my “Primary” because it guarantees uniqueness on for my person data.

Create Node2 (“Passport”) only

> Use the “From Node” tab to create the Passport nodes.

Create relationship (“HAS PASSPORT”) only

> Filling out the top section is the same as before. But I need to fill out all three tabs to tell Apache Hop how identify and join the “From Node” and the “To Node”.

> Filling out the “From Node” tab:

> Filling out the “To Node” tab:

> Filling out the “Relationship” tab:

I hard-coded the relationship type as “HAS_PASSPORT”. I also added a test field called “testrelationshipproperty” to test and confirm that Hop will allow me to write properties onto the HAS_PASSPORT relationship.

TIP: If you are having a tough time getting your relationships created, take a look at your capitalization. My csv file has a field called UniqueId, but my database property is called “uniqueId”. If at all unsure, use copy paste.

__Sharing these examples on Github

If you are curious to try out these transforms for yourself, please feel free to download the repo. My goal is to build out samples for each of the key Neo4j transforms as well as some other transforms that are of interest to my prospects and myself.

Samyouell/hop-neo4j-examples: Sample Apache Hop workflows, pipelines, transforms, that demonstrate how to use Apache Hop to ETL data in and out of Neo4j database (github.com)

I’ll try to keep the Hop files organized to make it easy to get right to the specific example you are interested in. For this blog, please check the “Transform-Neo4j-Output” folder.

Great job! Since yesterday was Halloween, I think you should definitely binge on some of that candy that’s lying around your house.

Cute Mummy Trick or Treat Bowl by Tammy Mitchell, 2015

--

--

Sam L

I’m a sales engineer working at Neo4j. I’m hoping to use Medium to make life/work less frustrating.