Neo4j With Scala : Migrate Data From Other Database to Neo4j

Knoldus Inc.
Knoldus - Technical Insights
5 min readSep 12, 2016

Hello Folks;

Lets continue the Neo4j with Scala. We have earlier discuss about the use of Neo4j with Scala and Neo4j APOC with Scala. In this blog we are going to discuss about how we can migrate data from the other database like MYSQL, PostgreSQL, Oracle and Cassandra.

But before starting the journey, To those who have caught the train late ;) , this is what has happened till now:

  1. Getting Started Neo4j with Scala : An Introduction
  2. Neo4j with Scala: Defining User Defined Procedures and APOC

So till now we have some basic understanding of Neo4j and APOC with Scala. Now we will start blog for the series.

We use many databases for storing the data. But when we have a large amount of data and tables that time it becomes so hard to make query and execute them on the database. We have to be extra cautious to perform the task and we get bored to see same screen without any fun ;) .

We have solution for this. We can use Neo4j for this, where we find more fun to do work and its not difficult as we have discuss it in first blog. Before starting the discussion I want to clear one think that when we want to migrate data then we have to keep Neo4j APOC Kit (Download and Install) in the $Neo4j_Home/plugins. Now we start discussing about every database which I have mention before that how we can connect and transfer data from that database to Neo4j :

PostgreSQL :

When we use PostgreSQL, we have to download JDBC .jar file (Download) and keep it in the $Neo4j_Home/plugins and restart the Neo4j. After restarting the Neo4j server we are set for migrating the data from the PostgreSQL to Neo4j.

  • Now we load the driver with the APOC.

[code language=”scala”]
CALL apoc.load.driver(‘org.postgresql.Driver’);
[/code]

  • Now we create the call for fetching the data from the PostgreSQL where we have a table with name employee_details to Neo4j.

[code language=”scala”]
with ‘jdbc:postgresql://localhost:5432/testdb?user=postgres&password=postgres’ as url
CALL apoc.load.jdbc(url,’employee_details’) YIELD row
RETURN count(*);
[/code]

  • If we don’t want to use these step than we can provide URL in the $Neo4j_Home/conf/neo4j.conf and restart the server :

[code language=”scala”]
apoc.jdbc.postgresql_url.url=jdbc:postgresql://localhost:5432/testdb?user=postgres&password=postgres
[/code]

We can now fetch data direct. We don’t need to load driver also.

[code language=”scala”]
CALL apoc.load.jdbc(‘postgresql_url’,’employee_details’) YIELD row
RETURN count(*);
[/code]

  • Create Nodes and Relation in the data.

[code language=”scala”]
/**
* Here we define schema and key. In first column we define those column_name
* which can be null and In the second we those column name which we want unique.
*/

CALL apoc.schema.assert( {Detail:[‘name’,’age’,’address’,’salary’]},
{Detail:[‘id’]});

/**
* Here we load data in the neo4j and create node with the help of schema which we define
* earlier.
*/

CALL apoc.load.jdbc(‘jdbc:postgresql://localhost:5432/testdb?user=postgres&password=postgres’,’employee_details’) yield row
CREATE (t:Detail {id:toString(row.id), name:row.name,
age:toString(row.age), address:row.address, salary:toString(row.salary)})
return t;
[/code]

MYSQL :

We want to migrate data from the MYSQL as before we have to download JDBC .jar file (Download) and keep it in the $Neo4j_Home/plugins and update $Neo4j_Home/conf/neo4j.conf as:

[code language=”scala”]
apoc.jdbc.mysql_url.url=jdbc:mysql://localhost:3306/test?user=user&password=pass
[/code]

Restart the Neo4j server and we are set for migrating the data from the Cassandra to Neo4j.

  • We hit the MySQL and start fetching data and perform count operation.

[code language=”scala”]
CALL apoc.load.jdbc(‘mysql_url’,’employee_data’) yield row
RETURN count(*);
[/code]

Cassandra :

Now we migrate data from the Cassandra to Neo4j. Now we first import data into the cassandra if we don’t have data in the cassandra or we can use it for test also.

  • We have to run following command for setting up initial data in the cassandra :

[code language=”scala”]
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/playlist.cql
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/artists.csv
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/songs.csv
$CASSANDRA_HOME/bin/cassandra
$CASSANDRA_HOME/bin/cqlsh -f playlist.cql
[/code]

  • We have set our cassandra database with the data. We have to download JDBC .jar file (Download) and keep it in the $Neo4j_Home/plugins. We can provide URL in the $Neo4j_Home/conf/neo4j.conf as :

[code language=”scala”]
apoc.jdbc.cassandra_songs.url=jdbc:cassandra://localhost:9042/playlist
[/code]

Restart the Neo4j server and we are set for migrating the data from the Cassandra to Neo4j.

  • We hit the cassandra and start fetching data and perform count operation.

[code language=”scala”]
CALL apoc.load.jdbc(‘cassandra_songs’,’artists_by_first_letter’) yield row
RETURN count(*);
[/code]

  • Let’s create Index, Constraints and Relation the data.

[code language=”scala”]
/**
* Here we define schema and key.
*/
CALL apoc.schema.assert(
{Track:[‘title’,’length’]},
{Artist:[‘name’],Track:[‘id’],Genre:[‘name’]});
[/code]

  • Now we will load data and perform Merge and Create operation so that we can create the node and relationship between the node.

[code language=”scala”]
/**
* Here we load data in the neo4j and create node with the help of schema which we define
* earlier.
*/
CALL apoc.load.jdbc(‘cassandra_songs’,’track_by_artist’) yield row
MERGE (a:Artist {name:row.artist})
MERGE (g:Genre {name:row.genre})
CREATE (t:Track {id:toString(row.track_id), title:row.track,
length:row.track_length_in_seconds})
CREATE (a)-[:PERFORMED]->(t)
CREATE (t)-[:GENRE]->(g);
[/code]

  • We can see Relation Graph and it will look something like this :

[code language=”scala”]
/**
* For Displaying Performed Relation
*/

MATCH p=()-[r:PERFORMED]->() RETURN p LIMIT 25;
[/code]

[code language=”scala”]
/**
* For Displaying GENRE Relation
*/

MATCH p=()-[r:GENRE]->() RETURN p LIMIT 100;
[/code]

Oracle :

We are in the last database to migrate data to Neo4j but as obvious not least. We can download JDBC .jar file (Download) and keep it in the $Neo4j_Home/plugins and restart the Neo4j. We can provide URL in the $Neo4j_Home/conf/neo4j.conf as :

[code language=”scala”]
apoc.jdbc.oracle_url.url=jdbc:oracle:thin:user/password@127.0.0.1:1521/XE
[/code]

  • After restarting the Neo4j server we are set for migrating the data from the Oracle to Neo4j. We fetch the data from the Oracle where we have a table with name employee_details to Neo4j.Now we load the driver with the APOC.

[code language=”scala”]
CALL apoc.load.jdbc(‘oracle_url’,’employee_details’) YIELD row
RETURN count(*);
[/code]

  • Let’s create Index, Constraints and Relation the data.

[code language=”scala”]
/**
* Here we define schema and key.
*/
CALL apoc.schema.assert(
{EMPINFO:[‘name’, ‘age’,’salary’]},
{EMPINFO:[‘id’],ADDRESS:[‘address’]});
[/code]

  • Now we will load data and perform Merge and Create operation so that we can create the node and relationship between the node.

[code language=”scala”]
/**
* Here we load data in the neo4j and create node with the help of schema which we define
* earlier.
*/
CALL apoc.load.jdbc(‘oracle_url’,’employee_details’) yield row
MERGE (g:ADDRESS {name:row.ADDRESS})
CREATE (t:EMPINFO {id:toString(row.ID), name:row.NAME, age:toString(row.AGE), salary:toString(row.SALARY)})
CREATE (t)-[:LIVE]->(g);
[/code]

  • We can see Relation Graph and it will look something like this :

[code language=”scala”]
/**
* For Displaying Performed Relation
*/

MATCH p=()-[r:LIVE]->() RETURN p LIMIT 25;
[/code]

Now we can see that it is so easy to migrate data from other database to Neo4j.

After importing the data in Neo4j, we have to thing about the sync of data. We can use schedule process which can be timebase and automatically sync data between the databases. We can also used event based integration where we will defined the event at which we want to update the database.

Note : As we discuss I want to notify again if you do not update driver name into $Neo4j_Home/conf/neo4j.conf then you have to load driver in Neo4j otherwise you have to provide only driver name into the query.

I hope it will help for migrating data into the Neo4j. After fetching data we can write simple Scala code for persisting data in the Neo4j as we discuss in the first blog.

Reference:

  1. Neo4j Apoc Procedures
KNOLDUS-advt-sticker

--

--

Knoldus Inc.
Knoldus - Technical Insights

Group of smart Engineers with a Product mindset who partner with your business to drive competitive advantage | www.knoldus.com