Efficient Neo4j Data Import Using Cypher-Scripts

Fast Export/Import Operations with apoc.export.cypher and cypher-shell

Andrea Santurbano
Apr 26 · 8 min read
cat script.cypher | bin/cypher-shell -u username -p password

Before the improvements

CALL apoc.export.cypher.all({fileName},{config})
begin
CREATE (:Foo:`UNIQUE IMPORT LABEL` {name:"foo", `UNIQUE IMPORT ID`:0});
CREATE (:Bar {name:"bar", age:42});
CREATE (:Bar:`UNIQUE IMPORT LABEL` {age:12, `UNIQUE IMPORT ID`:2});
commit
begin
CREATE INDEX ON :Foo(name);
CREATE CONSTRAINT ON (node:Bar) ASSERT node.name IS UNIQUE;
CREATE CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`) ASSERT node.`UNIQUE IMPORT ID` IS UNIQUE;
commit
schema await
begin
MATCH (n1:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`:0}), (n2:Bar{name:"bar"})
CREATE (n1)-[:KNOWS]->(n2);
commit
begin
MATCH (n:`UNIQUE IMPORT LABEL`) WITH n LIMIT 20000 REMOVE n:`UNIQUE IMPORT LABEL` REMOVE n.`UNIQUE IMPORT ID`;
commit
begin
DROP CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`) ASSERT node.`UNIQUE IMPORT ID` IS UNIQUE;
commit
CREATE (:Foo:`UNIQUE IMPORT LABEL` {name:"foo", `UNIQUE IMPORT ID`:0});
MATCH (n1:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`:0}),  (n2:Bar{name:"bar"})
CREATE (n1)-[:KNOWS]->(n2);

Better Approach

UNWIND to the Rescue

General Idea

{batch: [{row1},{row2},{row3},...]}UNWIND $batch as row// now perform updates with the data in each "row" variable

The New Behaviour

CALL apoc.export.cypher.all({fileName},{config})
:BEGIN
CREATE INDEX ON :Bar(first_name,last_name);
CREATE INDEX ON :Foo(name);
CREATE CONSTRAINT ON (node:Bar) ASSERT node.name IS UNIQUE;
CREATE CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`)
ASSERT node.`UNIQUE IMPORT ID` IS UNIQUE;
:COMMIT
:SCHEMA AWAIT:BEGIN
UNWIND [{_id:3, properties:{age:12}}] as row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Bar;
UNWIND [{_id:2, properties:{age:12}}] as row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Bar:Person;
UNWIND [{_id:0, properties:{born:date('2018-10-31'), name:"foo"}}, {_id:4, properties:{born:date('2017-09-29'), name:"foo2"}}] as row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Foo;
UNWIND [{name:"bar", properties:{age:42}}, {name:"bar2", properties:{age:44}}] as row
CREATE (n:Bar{name: row.name}) SET n += row.properties;
UNWIND [{_id:6, properties:{age:99}}] as row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties;
:COMMIT
:BEGIN
UNWIND [{start: {_id:0}, end: {name:"bar"}, properties:{since:2016}}, {start: {_id:4}, end: {name:"bar2"}, properties:{since:2015}}] as row
MATCH (start:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row.start._id})
MATCH (end:Bar{name: row.end.name})
CREATE (start)-[r:KNOWS]->(end) SET r += row.properties;
:COMMIT
:BEGIN
MATCH (n:`UNIQUE IMPORT LABEL`) WITH n LIMIT 20000 REMOVE n:`UNIQUE IMPORT LABEL` REMOVE n.`UNIQUE IMPORT ID`;
:COMMIT
:BEGIN
DROP CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`) ASSERT (node.`UNIQUE IMPORT ID`) IS UNIQUE;
:COMMIT
UNWIND [
{_id:0, properties:{born:date('2018-10-31'), name:"foo"}},
{_id:4, properties:{born:date('2017-09-29'), name:"foo2"}}] as row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id})
SET n += row.properties SET n:Foo;
UNWIND [
{start: {_id:0}, end: {name:"bar"}, properties:{since:2016}}, {start: {_id:4}, end: {name:"bar2"}, properties:{since:2015}}] as row
MATCH (start:`UNIQUE IMPORT LABEL`
{`UNIQUE IMPORT ID`: row.start._id})
MATCH (end:Bar {name: row.end.name})
CREATE (start)-[r:KNOWS]->(end) SET r += row.properties;

Cypher-Shell’s parameter support to the rescue!

CALL apoc.export.cypher.all({file},{format:'cypher-shell',
useOptimizations:{type:'unwind_batch_params',unwindBatchSize: 2}})
:begin
CREATE INDEX ON :Bar(first_name,last_name);
CREATE INDEX ON :Foo(name);
CREATE CONSTRAINT ON (node:Bar) ASSERT (node.name) IS UNIQUE;
CREATE CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`)
ASSERT (node.`UNIQUE IMPORT ID`) IS UNIQUE;
:commit
CALL db.awaitIndex(':Foo(name)');
CALL db.awaitIndex(':Bar(first_name,last_name)');
CALL db.awaitIndex(':Bar(name)');
:param rows => [{_id:4, properties:{age:12}}, {_id:5, properties:{age:4}}]:begin
UNWIND $rows AS row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Bar;
:commit
:param rows => [{_id:0, properties:{born:date('2018-10-31'), name:"foo"}}, {_id:1, properties:{born:date('2017-09-29'), name:"foo2"}}]:begin
UNWIND $rows AS row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Foo;
:commit
:param rows => [{_id:2, properties:{born:date('2016-03-12'), name:"foo3"}}]:begin
UNWIND $rows AS row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Foo;
:commit
:param rows => [{name:"bar", properties:{age:42}}, {name:"bar2", properties:{age:44}}]:begin
UNWIND $rows AS row
CREATE (n:Bar{name: row.name}) SET n += row.properties;
:commit
:param rows => [{start: {_id:0}, end: {name:"bar"}, properties:{since:2016}}]:begin
UNWIND $rows AS row
MATCH (start:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row.start._id})
MATCH (end:Bar{name: row.end.name})
CREATE (start)-[r:KNOWS]->(end) SET r += row.properties;
:commit
:begin
MATCH (n:`UNIQUE IMPORT LABEL`) WITH n LIMIT 2 REMOVE n:`UNIQUE IMPORT LABEL` REMOVE n.`UNIQUE IMPORT ID`;
:commit
:begin
MATCH (n:`UNIQUE IMPORT LABEL`) WITH n LIMIT 2 REMOVE n:`UNIQUE IMPORT LABEL` REMOVE n.`UNIQUE IMPORT ID`;
:commit
:begin
MATCH (n:`UNIQUE IMPORT LABEL`) WITH n LIMIT 2 REMOVE n:`UNIQUE IMPORT LABEL` REMOVE n.`UNIQUE IMPORT ID`;
:commit
:begin
DROP CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`) ASSERT (node.`UNIQUE IMPORT ID`) IS UNIQUE;
:commit
:param name => expression
:param rows => [{_id:4, properties:{age:12}}, {_id:5, properties:{age:4}}]
UNWIND $rows AS row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Bar;

The Benchmarks

The Benchmarks: Exporting the cypher-script

$ time ./bin/cypher-shell -u neo4j -p andrea "call apoc.export.cypher.all('import/exportDataCypherShellOld.cypher',{format:'cypher-shell', useOptimizations: {type: 'none'}, batchSize:100})"
real 1m46.790s
user 0m1.384s
sys 0m0.229s

$ time ./bin/cypher-shell -u neo4j -p andrea "call apoc.export.cypher.all('import/exportDataCypherShell.cypher',{format:'cypher-shell', useOptimizations: {type: 'unwind_batch', unwindBatchSize: 20}, batchSize:100})"
real 1m41.065s
user 0m1.349s
sys 0m0.214s
$ time ./bin/cypher-shell -u neo4j -p andrea "call apoc.export.cypher.all('import/exportDataCypherShellParams.cypher',{format:'cypher-shell', useOptimizations: {type: 'unwind_batch_params', unwindBatchSize:100}})"
real 1m33.585s
user 0m1.387s
sys 0m0.222s

The Benchmarks: importing the generated cypher-script

$ time ./bin/cypher-shell -u neo4j -p andrea < 'import/exportDataCypherShellOld.cypher' > 'import/output.exportDataCypherShellOld.log'
real 252m33.279s
user 13m53.566s
sys 6m3.110s
$ time ./cypher-shell -u neo4j -p andrea < 'import/exportDataCypherShell.cypher' > 'import/output.exportDataCypherShell.log'
real 110m27.697s
user 2m48.452s
sys 0m55.657s
$ time ./bin/cypher-shell -u neo4j -p andrea < 'import/exportDataCypherShellParams.cypher' > 'import/output.exportDataCypherShellParams.log'
real 33m14.835s
user 26m12.681s
sys 0m46.270s

Neo4j Developer Blog

Developer Content around Graph Databases, Neo4j, Cypher, Data Science, Graph Analytics, GraphQL and more.

Andrea Santurbano

Written by

Data Lover @AgileLARUS

Neo4j Developer Blog

Developer Content around Graph Databases, Neo4j, Cypher, Data Science, Graph Analytics, GraphQL and more.