Tap into Hidden Connections — Translating Your Relational Data to Graph

Jennifer Reif
Neo4j Developer Blog
8 min readJun 13, 2018

--

*Last updated 4/11/2019

Graph databases are a relatively new space in the technology industry. However, that doesn’t stop many companies big and small from jumping in. There are many cases to be made for one path or another, but one thing can be demonstrated in using graph databases — the importance of and meaning hidden in relationships. Graph databases tap into how pieces of data connect to one another, showing even the most remote connections between seemingly random pieces of data.

Perhaps you have read or heard about graph databases. Maybe you have even learned a little bit about them. But where do you go from there? How do you start taking real-life data from one source and put it into a graph to realize the value hidden within?

Today, I’d like to walk you through one way to translate your data from a relational database into a Neo4j graph database. While there are plenty of other types of data sources (like NoSQL), this post will simply focus on going from relational to graph. Hopefully, in future posts, I can dive into moving other types of sources into graph and other ways to translate relational data to graph, as well.

Neo4j ETL Tool

The Neo4j ETL (Extract-Tranform-Load) tool was built to allow developers an easy way to load relational data structures into a graph. It includes a 3-step process that allows the user to specify a relational database via a JDBC setup, then edit the data model mapping that the tool creates for graph, and finally import all of the data itself to Neo4j.

The tool is easy enough to access and use. There are two ways to interact with the tool — via command line or via Neo4j Desktop application. For working with the command line, you can download the GitHub project and run each step at a command prompt.

Add the Neo4j ETL Application

For working with the user interface version in Neo4j Desktop, you need to add this key in Neo4j Desktop — https://r.neo4j.com/neo4j-etl-app. This is demonstrated in the screenshot below.

Once this is complete, you will have access to the ETL application. Now, we need to add it to your graph applications.

First, go to your Projects menu within Neo4j Desktop and choose/create a project to work within. Then, click on the Add Application under the first banner on the right menu pane. When the popup displays to show you the applications you can add, choose the ETL App by clicking the Add button. The screenshots below step through this process visually.

Now we are ready to launch the application and start transforming data from relational to graph! To load the application, just click on the newly-added application in your project.

Select a Project to Work With

This will open a new window with the main screen of the application, as shown below. You will need to choose the project to work within. This will tell the application where the Neo4j database is that you want to load your data into. To the right, you can see the list of possible databases in that project that you can use. Since no connections show up on the left, we don’t have any relational databases defined in our tool, so we will click ADD CONNECTION to set one up.

Setup RDBMS Connection

This screen is where we set up a relational database connection. The tool allows most types of relational databases with an JDBC Driver, including MySQL, PostgreSQL, Oracle, Cassandra, DB2, SQL Server, Derby, and others. While MySQL and PostgreSQL databases are built-in to the tool, all other databases can be easily set up by specifying a driver file. We will walk through each step below.

For this example, I will be using a PostgreSQL relational database with the Northwind dataset loaded. If you do not already have a sample database/dataset to work with, you can follow along with this post by downloading PostgreSQL and running the shell script pulled from a GitHub repository for the Northwind dataset for Postgres.

By choosing the database type from the Type dropdown field first, some of the other fields will automatically populate for you and save a bit of hassle. For this example, you will choose the postgresql option. Notice that the port and connection URL are already correct for that choice.

We can now fill in the rest of the details, as needed. You can put whatever you like in the Connection Name field, but the Database field must match the name you gave your relational database. If you are using another database outside of Postgres or MySQL, be sure to choose a driver file for the tool to use for connecting.

*Note: if you set up a username and password for your relational database, you must enter those in the appropriate fields. However, if you did not specify a username and/or password for your relational database, you can leave one or both of the fields blank.

Once all the details are filled in, click the Test and Save Connection button at the bottom of the pane. You should see a blue message bar appear at the top to say your connection was saved successfully. A red bar with an error message will appear if one of the fields is not correct or it cannot find the relational database.

Retrieve RDBMS Metadata

Ok, now we have our relational database connection set up. Next, we need to map from relational to graph by choosing our relational database connection that we are loading from in the left list, and then selecting our graph database we are writing to in the list on the right. Then you can click the Start Mapping button.

You will see a blue message bar at the top of your screen if the step was successful or a red one if the step failed. Since ours was successful, our Start Mapping button inactivates, and the Next button activates to proceed to the next step. If the step fails, you can click the SEE LOGS button at the bottom to debug the process.

The Data in Mapping.json

This is where the mapping.json file is created with an outline of the mapping between the databases. This is an important part of the process because you can make changes to this file if you decide you want to leave out particular tables or fields in the migration.

To look at the file, pull up your main Neo4j Desktop window, go to your project, then click Manage on the Neo4j database where you will migrate the data (in this example, it is named ETL_db). Click the Open Folder button in the header pane and go into the import folder.

There, you should see a postgresql_northwind_mapping.json file (or some similarly-named file). You can open this and edit if there are any changes you want to make before going to the next step. For this example, I will leave the mapping.json file without any changes.

Adapt the Mapping

Going back to the ETL application window, you can click the Next button at the bottom right, which will take you to a data model screen. You can move the nodes and relationships in the diagram on the right side of the screen so that you can better view each of them.

Here, you can change names of nodes, relationships, and even property (field) names and data types. I do think we can choose more appropriate relationship types, changing them from generic nouns to more specific verbs.

For instance, I changed the relationship from Order node to Product from “ORDER_DETAILS” to “INCLUDES” because an order includes products (i.e. line items).

Data Import

Once you are satisfied, you can click the SAVE MAPPING button and then click the NEXT button at the bottom right. This takes us to the final screen, which is where we choose our import method. There are two options, and both are explained below.

  1. Bulk Import — fast, bulk-import of large data sets, database must be offline to run
  2. Online Import — runs Cypher via BOLT connection, database can be online and running

The key thing to note on the import method is whether your Neo4j database is running/online or not.

Finally, you can click the Import Data button and see the output display at the bottom of the screen from the behind-the-scenes commands that get automatically executed. When it completes, a blue message bar will appear at the top of the window to say that the import was successful (screenshot below).

Check Imported Data

To verify everything loaded and to start working with your data in Neo4j, we can look at the data model and run queries from Neo4j Browser or an application connected to your database.

For this example, we will go back to our main Neo4j Desktop window, go to Manage on the ETL_db database, then click Open Browser. Executing the CALL db.schema() on the Browser command line will display the graph data model for you.

CALL db.schema()

Notice that the model does not contain the Customer Demographics node connecting to Customer node. This is because the Postgres data set used did not have any data in the customer demographics csv file.

From here, you can run queries, update data, and investigate the data to gather value and insight from structuring your connected data as a graph. The screenshot below shows an initial query (under Favorites -> Sample Scripts -> Basic Queries -> Get some data) to retrieve and display some customers, shippers, orders, and products in the data set.

If you have any questions, do not hesitate to reach out on our neo4j-users Slack community under the #neo4j-etl channel. As always, happy learning! :)

Resources

--

--

Jennifer Reif
Neo4j Developer Blog

Jennifer Reif is an avid developer and problem-solver. She enjoys learning new technologies, sometimes on a daily basis! Her Twitter handle is @JMHReif.