Getting Set Up with Kettle and Neo4j

Matt Casters
Dec 4, 2018 · 3 min read

Neo4j is an awesome piece of technology. The ability to use cutting edge graph algorithms like shorted path, community detection, centrality while running transactional operations is only possible in any meaningful and performing way on a fully native graph database like Neo4j.

So how do you get started? Well, you need to load data into Neo4j first and this is where a data integration tool like Kettle might come in.

What is Kettle?

Kettle is an open source data integration (ETL) platform with the ability to visually design your work. It has been around for over 17 years and in that time it became quite mature, stable, high performing and feature rich.

From Kafka to Neo4j

For example, here is a visual representation of a Kettle “transformation” which does the following:

  • Receive messages from Kafka
  • Do look-ups in MongoDB
  • Write nodes and relationships to Neo4j

The technical aspect of setting up a transformation like this takes a few minutes at most so you can focus on things like data quality, the graph model, the requirements, data accuracy, performance and so on.

As you can imagine doing this without the need for coding, scripting or anything like that can be very time-saving for the set-up and the maintenance of your solution later on.

So how do you get started with Kettle itself and what about the Neo4j plugins?

Kettle download and installation

You can download Kettle (also known as “Pentaho Data Integration Community Edition“ — waaay too long to say everytime so we just say “Kettle”) from SourceForge, the Pentaho project, latest version 8.1 (right now, check for updates) and look for PDI-CE : pdi-ce-8.1.0.0–365.zip → Warning: it’s about 1GB bundled with all plugins!

Make sure you have the right Java 8/9 runtime environment properly installed for your computer system. Java from Oracle or OpenJDK is recommended. Kettle runs fine on Windows, OSX or Linux.

Now unzip the downloaded archive somewhere. It will give you an extra data-integration/ folder. This is all you need to do for as far as Kettle is concerned.

Install the Neo4j plugins for Kettle

You can get the latest version of the plugins at neo4j.kettle.be It points to the community project where the Neo4j Kettle plugins were first developed and where our improvements are done. From the releases download the latest version archive: Neo4JOutput-<version>.zip

Unzip this where you placed your Kettle distribution in the data-integration/plugins/ folder.

Now the fun starts

Now you can start up the Kettle GUI called Spoon. The naming is a silly pun on Kettle and anything kitchen related.

  • on Windows start Spoon.bat
  • on OSX start spoon.sh or the app: “Data Integration.app”
  • on Linux start spoon.sh

You will notice a welcome page with useful links:

The Spoon Welcome page

You can find the Neo4j plugins when you create a new transformation:

The Neo4j steps category in Spoon

Next steps

Here are a few things you can do to read up on Kettle and Neo4j and some pointers on where to get help:

Stay tuned for the next story in which I’ll be going over a few concrete examples of data loading into Neo4j.

Enjoy Kettle!

Matt

Neo4j Developer Blog

Developer Content around Graph Databases, Neo4j, Cypher, Data Science, Graph Analytics, GraphQL and more.

Matt Casters

Written by

Neo4j Chief Solutions Architect, Kettle Project Founder

Neo4j Developer Blog

Developer Content around Graph Databases, Neo4j, Cypher, Data Science, Graph Analytics, GraphQL and more.