New Neo4j 4.0 features — copy a database (and more)

Fine-grained control in database replication, compaction, reorganisation

Ljubica Lazarevic
Neo4j Developer Blog
3 min readMay 15, 2020

--

Markus Spiske on unsplash

There have been a number of big new features that have been introduced into Neo4j 4.0 — multi database, fabric, fine-grained security… the list is long.

I thought I’d take the opportunity to write some short posts about some of the other features that have also snuck in. In this post we’re looking at copying a database.

Then

Back in the good old days the process to copy a Neo4j database could invariably involve using store-utils, a plugin developed by Michael Hunger.

This is a very powerful tool. Not only does it enable you to copy your database, but you could also reduce its size (which was growing through continuous writes and deletes) through ID reuse and reorganisation of relationships and properties. It defragmented and co-located nodes and relationships. And it was able to skip defect records in case you somehow got inconsistencies in your store. Store-utils also allowed you to skip nodes with certain labels or skip certain labels, properties and relationship-types entirely during the process.

Who can resist the therapeutic blinking of block swaps?

Like with all powerful tools, store-utils required careful handling. Apart from considerations around what version of the database you were using, you’d also have to think about repopulating and rebuilding indexes.

Now with 4.0

As part of the replacement to the store-utils, you can now run neo4j-admin copy. As well as enabling you to copy a database to a new destination, it also has a number of fine-grained options to configure how the copy is created, including:

  • Filtering out nodes with a specific label
  • Ignoring certain labels
  • Ignoring specified relationship types
  • Skipping over property keys

This brings a whole suite of flexible options when it comes to replicating, duplicating and sampling existing databases. For example, you can think about generating sub graphs (for Neo4j fabric sharding) or graph samples for distributing across different databases — an extremely powerful feature.

This version also implemented several of the changes that were prototyped in store-utils but never released, like parallel execution for reads and writes using the same fast, parallel ingestion pipeline that the neo4j-admin import -mode csv uses.

The latest version of Neo4j also brings with it automated ID reuse as well to keep the database as compact as possible. Now would be a good time to remind you not to depend on internal IDs for your external processes and users!

Example

Let’s have a quick look at an example. Imagine we have a database, with the default name neo4j, containing data with node categories of Movie and Person (much like the :play movie-graph dataset). We wish to make a copy of this database, but we only want the Person data, and we want to call this new database persons.

We’ll be using neo4j-admin for this, which you will find in the bin folder. To to make the above copy of the data, we would do something like this:

bin/neo4j-admin copy --from-database=neo4j --to-database=persons --delete-nodes-with-labels="Movie"

This will produce the persons database with just the Person nodes and any relationships that may interconnect them.

Check out the documentation!

Happy compacting.

--

--