Monitoring Neo4j with Halin
For the last few weeks, I’ve been working a bit here and there on an application called Halin, for monitoring Neo4j databases and clusters. Now that I’ve gotten it to a useful state, I wanted to share it with others, and show a few things that you can do with it, and how it makes Neo4j administration easier.
If you want to try Halin out yourself, you can run it live here. All you need is connection details for a Neo4j instance. All of the open source code is available on GitHub.
For enjoying Halin as Graph App with auto-updates in Neo4j Desktop, just paste the Halin URL into the Graph App Install sidebar. For details see the GitHub Readme.
UPDATE: Recently Neo4j did a webinar about Halin on YouTube. If you’d like to see a live presentation / demo in that format, check YouTube!
A database is a complicated piece of software that takes care and feeding. Because it’s a core component, you need to know whether it’s running well, and if not, why not. Monitoring is about gathering data to determine this, and figuring out what you need to do to make your system better.
In previous articles on Neo4j monitoring, I wrote about how to use built-in JMX metrics with other tools like Hawtio. This is nice because you can visualize a lot about what’s going on with just configuration, no new software. If JMX isn’t your thing, Neo4j also supports Prometheus and Graphite.
The downside to these approaches is that they’re not very Neo4j specific. These external monitoring tools don’t know anything about a Neo4j database’s lifecycle. Additionally, you tend to get metrics per node in your cluster, but you don’t get metrics that show you the cluster overall.
With some help from some support folks at Neo4j, I created a dashboard showing overall cluster health. Here we can see three nodes of a cluster (the one with the green star is the leader). Each graph can enable or disable the data line for each node, so you can see stats individually.
To get started, first log in:
Once you connect, Halin fetches information about your cluster or single instance, and brings you to the cluster overview pane.
Memory is given prominent billing because memory management is so important to a performant Neo4j instance. You can also see garbage collection spikes, transaction load, and various metrics of page cache utilization, which can help diagnose performance issues.
Halin is a simple React application that uses a bolt connection to your database. Since Neo4j already exposes Bolt and Cypher natively, no configuration is needed to make it work with a new Neo4j instance. In a future release, the plan is to provide a Docker container as well so that an instance of Halin can be deployed and run alongside a particular Neo4j install, for example with neo4j run in kubernetes.
Halin contains a diagnostic advisor. This gathers a lot of metadata about your Neo4j instance on the fly, and then runs it through a series of rules which make suggestions about what’s good, what could use improvement, and where there are errors. This rule set will grow with time and user suggestions.
The intent of the advisor is to let you download a diagnostic package of information which will give you a broad overview of your entire cluster’s configuration, and to also help automate locating issues with your configuration.
In this screenshot above, we can see some green checkmarks indicating good things Halin checked (we have admin users, backups are enabled but not on an external port, and network port settings look good). There are some warnings there too — this cluster’s memory utilization is very high, and there are two errors. Because Neo4j manages users and roles per-node, Halin in this situation has detected that node3 is missing a user and a role that is present on other nodes in the cluster. Because this can lead to authorization errors should that user try to contact another node, this is a misconfiguration and it gets flagged.
You can always hit the “Download Diagnostics” button and download a JSON file containing everything Halin gathers. The rules are driven by this diagnostic bundle.
You’ll also see a “Configuration Diff” tool next to the Advisor. This will point out every last piece of configuration that’s different from machine to machine. This can be useful to ferret out misconfigurations as well.
Aside from cluster overviews, the individual machines in the cluster do matter, and there are options to monitor a number of different metrics about them, including CPU/memory utilization, disk space taken up by Neo4j, real-time page cache information, and machine-specific configuration.
Halin provides a user interface for creating users and roles as well, and lets you associate any user with any role, provided you logged into Halin as an admin user. Because each machine in a cluster manages auth individually, typically if you want to create a user on all nodes, you’d have to run the same code to create that user in each place. Halin automates this for you and tries to ensure a user is defined on a cluster-wide basis, rather than per instance.
Support for standalone Database, and Neo4j Community
Halin works with non-clustered installs of Neo4j. The only real difference is that you only get one tab for your database. Indeed in stand-alone mode, Halin tends to treat your Neo4j instance as a cluster with only one member. For Neo4j Community users, some monitoring components are disabled, because they require features that are only available in Enterprise, but most functionality is still available.
Neo4j Desktop GraphApp
Neo4j Desktop allows users to install Graph Apps to interact with their local or remote databases, and Halin works in this mode as well.
To configure this, go into Neo4j Desktop, check the “Graph Applications” tab on the left side of Neo4j Desktop. In the “Install Graph Application” box at the bottom, paste in the published Halin URL, and click Install.
Once Halin is installed, within each of your Neo4j Desktop Projects, you can click the “Add Application” button, and Halin will now be an option to install into that project. With a running database and Halin installed in the project, you can click Halin to auto-connect to the database running in that project.
In the coming months, I’m planning on adding more features, which will mostly be driven by common problems people run into with Neo4j in production. Halin’s job will be to detect whether or not this problem is present (or about to happen) and provide advice on how to avoid or fix it!