How to Restore Neo4j Backups on Kubernetes
--
This article is the companion to its sister, How to Backup Neo4j Running in Kubernetes. So if you’re looking for resources on creating backup sets in the first place, that’s a place to start.
To maintain a Neo4j cluster running in Kubernetes over time usually means to have a backup schedule in case things go wrong. So that thing has now gone wrong. How will you recover? This article will step you through how to do that.
If you haven’t tried Neo4j before you can launch it with the Neo4j Helm Chart. This should work with most Kubernetes distributions, but we’ll use GKE as an example.
Approach
To restore a backup in Neo4j, we use the standard neo4j-admin restore tool, just that in a Kubernetes environment we’re doing it inside of a container. Typically backup sets are stored as either a .tar.gz file, or as a raw directory. When taking backups, the neo4j-admin tool writes it as a directory of files, but often people will compress those sets and upload them to cloud storage.
We will use a specialized Docker container with the neo4j-admin tool as an initContainer to our pods in kubernetes. The init container’s tasks are simple:
- Mount the data drive where Neo4j expects to find its data (
/data
) - Download the backup set from google cloud storage, uncompressing it if needed
- Restore the graph database to
/data
using neo4j-admin
In this way, when the Neo4j docker container starts, it finds its graph database right where it expects it, and the Neo4j containers themselves remain unmodified.
The approach I implemented assumes the backup sets are stored on Google Storage, but with some simple modifications the script can run with any cloud storage provider.
The Restore Container
The code described in this section, and more documentation, can be found in the tools/restore directory of the Neo4j Helm Chart.
It expects a few environment variables as parameters:
GOOGLE_APPLICATION_CREDENTIALS
— path to a file on disk where the JSON service key can be found, that permits access to the backup set.BUCKET
— the google storage bucket location of the backup sets, which can include a path. For example,gs://my-bucket/graph-cluster/
DATABASE
— a comma-separated list of databases to restore, for exampleneo4j,system
. The restore container will be looking for date-stamped backups in the bucketTIMESTAMP
— optional. If you don’t provide this, the restore container will go looking for the latest backup, which will be namedgs://my-bucket/graph-cluster/DATABASENAME-latest.tar.gz
— which is the format created by the backup container described in the backup article.
That’s all that’s required. There are some optional parameters you can read more about in the restore container’s README file. They allow you to control behavior like where to find the backup in a compressed set, and whether or not to force-overwrite an existing database if one is found.
Configuring the Restore Container
To make that initContainer function, we need two pieces: the initContainer spec itself, and one extra shared volume. You can see a complete example in this deployment scenario that restores a single-instance Neo4j machine from a backup. By passing this deployment scenario as a set of parameters to the helm chart install process, it can just work with the right bucket & credentials.
Init Container
If you’re running in your environment, you’ll want to build the restore container yourself and change the image
reference here to the container registry of your choice. What this does is create the restore container, mount the Neo4j container’s /data into itself (this is what allows it to manipulate the persistent volume used by Neo4j), and then specify some environment variables to configure the restore process.
Service Account Key
A very important bit here is the /auth
mapping. This is how we’ll get our google application credentials into the restore container. The application credentials themselves are a generated Google Cloud Service Key, which we downloaded as JSON. When creating the service key, of course we have to give it permissions to read from the bucket where the backup set is stored. We’ll create a kubernetes secret containing the auth key like this:
MY_SERVICE_ACCOUNT_KEY=$HOME/.google/my-service-key.jsonkubectl create secret generic restore-service-key \
--from-file=credentials.json=$MY_SERVICE_ACCOUNT_KEY
This creates a kubernetes secret called “restore-service-key” which you’ll see mounted to /auth
inside of the restore container. When we mount it, that drive will appear to have one file called credentials.json
with the contents of our local key. The last part necessary is to add that secret as a volume to the core container, like this:
volumes:
- name: "restore-service-key"
secret:
secretName: "restore-service-key"
With this volume in place, when the initContainer mounts that secret to /auth, it can see the contents of the service key, and thus have protected access to backup sets.
Running the Restore Container
With the initContainer in place, your Neo4j cluster is now self-healing in several senses — if a pod crashes or dies, it will be of course restarted by Kubernetes, and the initContainer will restore it to the last backup that you’ve specified.
As part of a good maintenance routine, you can have a “latest backup” in a known spot on Google Storage (for example) and always point the initContainer to that latest backup set. In the event of a crash, the node will be back up and running in no time, with the data that you expect.
The “latest backup” functionality is already provided for by the backup container described in the article. Every time you take a backup, it simply uploads a timestamped backup, and a “latest” pointer. So there is always a stable URL for the latest backup set to be targeted by the restore container.
Restore Considerations
Neo4j Causal Clusters
A common way you might deploy Neo4j would be restore from last backup when a container initializes. This would be good for a cluster, because it would minimize how much catch-up is needed when a node is launched. Any difference between the last backup and the rest of the cluster would be provided via catch-up.
Single Node Installs of Neo4j
For single nodes, take care here. If a node crashes, and you automatically restore from backup, and force-overwrite what was previously on the disk, you will lose any data that the database captured between when the last backup was taken, and when the crash happened. As a result, for single node instances of Neo4j you should either perform restores manually when you need them, or you should keep a very regular backup schedule to minimize this data loss. If data loss is under no circumstances acceptable, simply do not automate restores for single node deploys.
Auth Files
As of Neo4j 3.4 and 3.5 series, data backups do not include authorization information for your cluster. That is, usernames/passwords associated with the graph are not included in the backup, and hence are not restored when you restore. This is something to be aware of; when launching a cluster typically you’re providing startup auth information and separate configuration anyway. If you create users, groups, and roles you may want to separately take copies of the auth files so that they can be restored when your cluster starts up.
In Neo4j 4.0 going forward — there is a system database that contains this same information. Take care on what you are backing up: if you want your fine grained permissions, users, and passwords to come with you, you must backup/restore the system database.
Alternatively, users may configure their systems to use LDAP providers in which case there is no need to backup any auth information.