Running Neo4j with Hosted Kubernetes in Google Cloud

David Allen
Jan 30, 2018 · 5 min read

Update: since this article was originally written, Google launched Kubernetes applications on GCP Marketplace. If you’re interested in a point-and-click approach specifically, please have a look at my other post on how to launch that. This post still contains accurate information, but is focused on a helm-based deploy.

Introduction

Since Neo4j published docker images, there are a huge number of options for how to deploy it. Because neo4j can be run as a clustered database, it’s helpful to use a container orchestration tool such as something lightweight like docker-compose, or something more robust like kubernetes.

Kubernetes is an open-source platform for automating deployments, scaling, and operations of containers across clusters of hosts. So essentially, it’s just the sort of tool that’s useful in maintaining a cluster of neo4j instances derived from docker containers. Google’s GCP provides a hosted Kubernetes engine option, so it is quite easy to set up a kubernetes cluster and deploy applications to it, which is what we’ll do today.

Prerequisites

The gcloud utility is a set of software that allows command line interaction with the Google Cloud. It’s equivalent to Amazon’s aws wrapper utility.

You’ll need it, so head over to that page and install it first!

You’ll also need a utility called kubectl which controls kubernetes clusters, which you can get from the kubernetes page, for just about any OS.

Create a Hosted Kubernetes Cluster on GCP

In the google cloud console, go to the Kubernetes clusters page, navigating like this:

Next, we’ll configure a kubernetes cluster, in this case just with 1 node (the “Size” parameter) since we don’t need a lot of extra capacity for this test instance. Remember that the cost scales with the number of nodes you need and the resources given to them. But this is a critical setting; later in this tutorial we’ll be deploying 3 neo4j pods onto a single node. Obviously this is only for demonstration purposes, as that type of deployment topology would not make sense in production, where you would want redundancy in your nodes to keep the cluster running should any one node fail.

Configuring a new cluster has many options; for our purposes I took most of the defaults, and clicked “Create” at the bottom, but of course full documentation is available from Google.

Make sure that your cluster has a minimum of 3 nodes. When deploying a causal cluster, we’ll be deploying a minimum of 3 different containers to the kubernetes cluster. (Or more, if you end up adding read replicas)

After the cluster starts, you should see a screen like below, which lets us know things are working well:

Connect to the Cluster

Now, we can simply click the “Connect” button, and google will give us commands we can use to execute with our Host OS and the kubectl program in order to control and deploy to this kubernetes cluster.

The kubectl proxy command sets up a local HTTP proxy so that you can talk to your google cluster as if it were local, running on your machine.

The following steps will assume that you executed this gcloud command, and the kubectl proxy command.

Deploy Neo4j to that Cluster

Next, we’ll need to apply a neo4j configuration to our kubernetes cluster. Kubernetes does this in YAML files.

Fortunately via the kubernetes-neo4j repo, we have basic defaults available that can be applied. In addition to that code, neo4j’s website also has an article describing how to deploy to kubernetes running locally.

Cutting to the chase though, these commands are necessary:

$ git clone https://github.com/neo4j-contrib/kubernetes-neo4j.git
$ cd kubernetes-neo4j
$ kubectl apply -f cores
service “neo4j” created
statefulset “neo4j-core” created

This git repo provides a set of scripts that has everything you need to deploy your neo4j cluster. The kubectl command here simply applies the out of the box configuration, which is a stateful set of 3 core nodes that can discover one another, and a DNS service.

After a minute or two of startup, on your localhost, you should be able to see your deployment.

Verifying Things Are Looking Good

You can look at the logs to ensure that the various pods are running correctly.

$ kubectl logs -f neo4j-core-0
(lots of output snipped)
2017-12-21 20:46:46.209+0000 INFO Discovering cluster with initial members: [neo4j.default.svc.cluster.local:5000]
2017-12-21 20:46:46.209+0000 INFO Attempting to connect to the other cluster members before continuing...
2017-12-21 20:47:48.650+0000 INFO Started.
2017-12-21 20:47:49.448+0000 INFO Mounted REST API at: /db/manage
2017-12-21 20:47:51.857+0000 INFO Remote interface available at http://neo4j-core-0.neo4j.default.svc.cluster.local:7474/

Looking good!

Scale As Needed

From here, you can follow the directions on the github repository and scale your cluster by adding read replicas, or adding new nodes overall. Most of this can be done by applying other templates which the github repo provides.

How does this all work?

The kubernetes configuration is just a set of environment variables and a bit of shell script wrapped around the official neo4j docker images. You can see that configuration in the statefulset.yaml file. The kubernetes-specific bits deal with the details of how the nodes discover one another, and what the topology of the default cluster is (in this case, 3 core nodes). The neo4j docker image itself already provides a lot of configuration items, like the ability to pass a whole range of configuration options directly to the database via environment variables.

Use It!

Using kubectl, you can already directly execute commands against the individual pods like so. Doing this doesn’t require setting up any extra networking or port permissions.

$ kubectl exec neo4j-core-0 -- bin/cypher-shell --format verbose "MATCH (n) RETURN count(n);"+----------+
| count(n) |
+----------+
| 0 |
+----------+
1 row available after 374 ms, consumed after another 4 ms

Where to go from here

You’ll want to configure and customize your deployment template for neo4j found in the cores directory of the github repo. In particular, the default deployment as of this writing specifies a clusterIP of None, meaning that the deployment is headless, and can’t be accessed from the outside. This is an initially wise setting, since the setup also disables authentication for the purposes of ease of setup. Make sure to carefully review the neo4j and kubernetes configuration to be sure you know what you’re getting!

These configuration options are non-trivial, and have heavy consequences for production performance and security. Consult your local kubernetes expert in order to make choices that are right for your deployment.

Google Cloud - Community

A collection of technical articles published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

David Allen

Written by

Google Cloud - Community

A collection of technical articles published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

More From Medium

More from Google Cloud - Community

More from Google Cloud - Community

More from Google Cloud - Community

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade