Deploy Neo4j on Kubernetes using Helm Chart and Azure DevOps
This writing aims to document how to deploy a Neo4J Causal Cluster using a helm chart in K8s. If the user is interested in deploying neo4j in AKS, you can follow this article and learn the theory in the process.
In this guide, you’ll see:
- Introduction to the Neo4j
- Neo4J running on Kubernetes
- Helm Chart Package Build
- Helm Chart Package Deployment (CD — Continue deployment)
- Conclusions
Introduction to the Neo4j
a. Neo4j Graph Platform
The Neo4j Graph Platform includes components that enable developers to create graph-enabled applications. It is used by developers, administrators, data analysts, and data scientists to access application data. Developers create the data in the graph by either importing it into the graph or using the Cypher language to implement the data model. In addition, developers are responsible for integrating the graph with other systems and DBMS installations. Admins manage the processes and files related to the Neo4j installation. Data scientists and data analysts typically use a combination of Cypher queries as well as tools to analyze the data. End-users typically use applications written by developers to access the graph data.
To see more details, please go to https://neo4j.com/docs/getting-started/current/
b. Raft protocol
Either a standalone installation or a docker-Kubernetes installation, neo4j uses one or more core servers to work (depends on the setting). Another strategy that neo4j uses is the Raft protocol. One of Raft’s primary design goals is to be easily understandable so that there are fewer places for tricky bugs to hide in implementations. The Raft protocol describes three roles that an instance can play: leader, follower, and candidate. These are transient roles, and any core server can expect to play them throughout the lifetime of a cluster. While it is interesting from a computing science point of view to understand those states, operators should not be overly concerned: they are an implementation detail. As each database operates within a logically separate Raft group, a core server can have multiple roles: one for each database. For example, it could be the leader of a database system while also being a follower of the database neo4j.
Once bootstrapped, each Core Server spends its time processing database transactions. Updates are reliably replicated around Core Servers via the Raft protocol. Updates appear in the form of a (committed) Raft log entry containing transaction commands which are subsequently applied to update the database.
For safety, within any Raft protocol instance, there is only one Leader able to make forward progress in any given term. The Leader bears the responsibility for imposing order on Raft log entries and driving the log forward regarding the Followers. Followers maintain their logs regarding the current Leader’s log. Should any participant in the cluster suspect that the Leader has failed (not receiving new entries or heartbeats), then they can instigate a leadership election by entering the Candidate state. In Neo4j Core Servers, this failure detection window is set by default above 20s to enable more stable leaders.
This is the essence of a non-blocking consensus protocol, which allows Neo4j Causal Clustering to provide continuous availability to applications.
To see more details, please go to Causal Clustering lifecycle — Operations Manual (neo4j.com)
2. Neo4J Running on Kubernetes
a. Statefulsets and Persistent Volumes
Neo4j Docker in Kubernetes uses StatefulSet: neo4j-neo4j-core. Core mode is required for the PROD environment, with a minimum of three pods, one leader, and two followers. In the development environment, you can use Single mode (Single instance, only one pod). Besides, neo4j works with two databases: system DB for administration purposes and the neo4j database for business data:
Go to the previous point called “b. Raft protocol” to see theory about this.
At the architecture level, every pod has its respective volume (disk) for persistent data:
b. Communication
Neo4j is going to communicate with one or many apps. In the image, you can see the communication with two apps: data module and reporting module. This is possible with the bolt protocol (port TCP:7687):
3. Helm Chart Package Build
a. Main modification of the Helm Chart
For version 4.4.x you can download the source code for Helm Chart from this GitHub link:
https://github.com/neo4j-contrib/neo4j-helm/tree/4.4.3.
It’s necessary to modify these files on helm chart:
- File /templates/core-statefulset.yaml:
- File pv-pvc-volumes.yaml to create volumes pointing to fileshare (storage account):
- File /develop-rcm/values.yaml:
b. Create the image and prepare the files to deploy the package (CI — Continuous Integration).
The Neo4j Docker image includes some basic configuration defaults that should not need adjustment for most cases.
By default, the Docker image exposes three ports for remote access:
- 7474 for HTTP
- 7473 for HTTPS
- 7687 for Bolt
We will use these ports to connect to Neo4j inside the container, accessing it from Neo4j Browser, an application, or other methods.
It recommended creating the image before deploying it on the k8s cluster to scan it with the WhiteSource tool and review it if it is a secure image with the AquasecScanner tool.
Before deploying the image, we add a docker file by using these simple commands:
For more details, go to How-To: Run Neo4j in Docker — Developer Guides and Neo4j with Docker — Developer Guides
To prepare the package in Azure DevOps, you can create a build pipeline to get the code, build the image with a docker file, clean the code with WhiteSource, review the security of the image with the AquasecScanner tool, push the image to the registry, and copy and publish the helm chart files. This is the yml code used for WhiteSource and the AquasecScanner tools:
4. Helm Chart Package Deployment (CD — Continue deployment)
Create an Azure DevOps release pipeline with three simple tasks to deploy the neo4j helm chart using the artifact built in our previous step:
• “Helm tool installer” with Helm Version Spec “3.1.0”
• “Replace Tokens” to replace the variables and,
• “Package and deploy Helm charts” by specifying the “Chart Path” and “Value File”
Paths:
5. Conclusions
Neo4j in k8s using Docker works like a standalone installation, with the advantage that you have all the features of k8s at your disposition. Basically, you can only get the image from the Docker Hub repository and configure some sections in the helm chart, and it’s done. You can create the package and deploy the image in AKS K8s using Azure Pipelines. Additionally, some essential data like logs, Prometheus metrics, and backups are configured in the cluster as persistent volumes pointing to file shares (storage account) to separate the information to prevent data loss.