Better Neo4j plugins for Kettle

Matt Casters
Neo4j Developer Blog
3 min readSep 6, 2019

Dear Neo4j and Kettle friends,

Last week we released version 4.1.0 of the Neo4j plugins for Kettle. “It’s just a point release!”, you say. “Does it really warrant a whole story?”, you ask. I think that in this case we need to tell the story…

The thing is, after the many changes in 4.0.0 life in Kettle land was great for Neo4j users. Loading data was working and bugs were few and far between.

However we can always do better and most notably there were some fundamental changes that needed to happen to the Neo4j Output and Cypher steps.

The Neo4j Output step

This step is used a lot but in a couple of scenarios, most notably where MERGE statements were generated and executed, performance simply was not optimal. So major changes needed to happen in the code of the step to increase performance. Now, UNWIND statements are used all the time to speed up operations, even if it’s hard to do so. I hope you’ll find that your existing transformations simply run faster now.

The Neo4j Output step icon
Icon of the Neo4j Output step

The Neo4j Cypher step

This step needed a few key changes to make sure that it played nicely with more complex clustering scenarios while loading data in transactions. It also prepares the code to handle the oncoming API changes for Neo4j server 4.0.

When returning data from Neo4j, the Cypher step also needed extra information about the source property data type so that data conversions can be done a more consistent and safe way.

Note the extra (optional) “Source Type” column for the return values. It allows for more accurate data conversion.

Hunting bugs!

Every good programmer knows that if you make any serious changes to a code-base that you introduce errors or, even worse, compatibility issues.

To make this easier to deal with going from release to release and to protect our users from nasty issues we created a new project called kettle-neo4j-integration with the sole purpose of validating that the Neo4j steps are working correctly.

The way we do this is as follows:

  • First load data into Neo4j with a particular step.
  • Read the data back from Neo4j.
  • Verify that the data is exactly as expected.
A sample integration test job

This project obviously uses Kettle Unit Tests, part of the data sets plugin. It makes it easy to write more tests in the future. While writing the various integration tests we found and fixed various issues already making 4.1.0 a better, more stable release. From now on, every time we find more issues we will write more integration tests to make sure we don’t repeat mistakes.

Given the many positive benefits, I would actually encourage you to write your own integration tests for the critical parts of your own Kettle projects. It will allow you to protect your investments and requirements against inadvertent future changes. It doesn’t take much work and it pays off in the long term. Writing unit and integration tests is just a good idea. You can fairly easily run the integration tests on Neo4j and Kettle, both running in a docker container. In a next story I’ll explain how you can do that.

Get the goodies

For your convenience I’ve set up my old website kettle.be from which you can easily download pre-built releases of Kettle with all the latest plugins. In fact, the Kettle Neo4j Remix downloads have everything you need to work with Neo4j, Unit and Integration tests and much more.

Links to the various plugins and projects can be found on kettle.be as well.

Have fun with Neo4j and Kettle!

Cheers,
Matt

--

--

Matt Casters
Neo4j Developer Blog

Neo4j Chief Solutions Architect, Kettle Project Founder