Getting Set Up with Kettle and Neo4j
Neo4j is an awesome piece of technology. The ability to use cutting edge graph algorithms like shorted path, community detection, centrality while running transactional operations is only possible in any meaningful and performing way on a fully native graph database like Neo4j.
So how do you get started? Well, you need to load data into Neo4j first and this is where a data integration tool like Kettle might come in.
What is Kettle?
Kettle is an open source data integration (ETL) platform with the ability to visually design your work. It has been around for over 17 years and in that time it became quite mature, stable, high performing and feature rich.
For example, here is a visual representation of a Kettle “transformation” which does the following:
- Receive messages from Kafka
- Do look-ups in MongoDB
- Write nodes and relationships to Neo4j
The technical aspect of setting up a transformation like this takes a few minutes at most so you can focus on things like data quality, the graph model, the requirements, data accuracy, performance and so on.
As you can imagine doing this without the need for coding, scripting or anything like that can be very time-saving for the set-up and the maintenance of your solution later on.
So how do you get started with Kettle itself and what about the Neo4j plugins?
Kettle download and installation
You can download Kettle (also known as “Pentaho Data Integration Community Edition“ — waaay too long to say everytime so we just say “Kettle”) from SourceForge, the Pentaho project, latest version 8.1 (right now, check for updates) and look for PDI-CE : pdi-ce-18.104.22.168–365.zip → Warning: it’s about 1GB bundled with all plugins!
Make sure you have the right Java 8/9 runtime environment properly installed for your computer system. Java from Oracle or OpenJDK is recommended. Kettle runs fine on Windows, OSX or Linux.
Now unzip the downloaded archive somewhere. It will give you an extra data-integration/ folder. This is all you need to do for as far as Kettle is concerned.
Install the Neo4j plugins for Kettle
You can get the latest version of the plugins at neo4j.kettle.be It points to the community project where the Neo4j Kettle plugins were first developed and where our improvements are done. From the releases download the latest version archive:
Unzip this where you placed your Kettle distribution in the
Now the fun starts
Now you can start up the Kettle GUI called Spoon. The naming is a silly pun on Kettle and anything kitchen related.
- on Windows start Spoon.bat
- on OSX start spoon.sh or the app: “Data Integration.app”
- on Linux start spoon.sh
You will notice a welcome page with useful links:
You can find the Neo4j plugins when you create a new transformation:
Here are a few things you can do to read up on Kettle and Neo4j and some pointers on where to get help:
- Visit the Neo4j plugins wiki
- Look at examples for the Neo4j Kettle plugins
- Kettle documentation
- Read a book: Pentaho Data Integration Quick Start Guide
- Join the Neo4j community
- Ask your Neo4j contact for professional services for help with your Neo4j projects
Stay tuned for the next story in which I’ll be going over a few concrete examples of data loading into Neo4j.