Over the years working with Neo4j, I was creating small tools to help me to understand what kind of data a Neo4j database contains. I wanted to know what the label and relationships counts were in the database and which properties are there to give a good estimate, how a database will grow over time.
With the availability of Neo4j Desktop, I created a Neo4j Desktop App based on the small tools I used before called Analyze Database. While doing this I added two more tools. Live Count which counts all the Nodes per Label and Relationships per Relationship Type every n seconds and plot this in a timeline chart and Model which gives you the ability to explore the database schema. This is especially useful when the normal “call db.schema()” gives you a hairball structure in the Neo4j Browser. I created this Neo4j Desktop App with the valuable help from Michael Hunger.
Install the Neo4j Database Analyzer as follows in the Neo4j Desktop
(1.1.10 or later) is easy:
Open the “Graph Applications”-sidebar, and paste the url:
into the “Install Graph Application” field and press “Install”
In the following sections a more in depth explanation is given for each tool.
When you press “Analyze Database” the database structures will be counted. While the tool is analyzing the database you will see in the “Summary” tab a listing of all the steps the tool is doing to analyze the database. When finished this listing is moved to the “Log” tab and the results of the counts are displayed in the “Summary” tab.
With the default settings this tool will execute counts while using the count store (database statistics), which means that the queries are not expensive for the database. The following counts are executed while using the count store:
- Relationship Types
- Outgoing relationship types per label
- Incoming relationship types per label
It is best to start with the default settings.
Analyzing Properties and Label Combinations
When you want to analyze Node or Relationship properties or Label Combinations the count store cannot be used which means that the query load on the database will be more involved. Therefore you have to specify in the Label Filter and Relationship Type Filter which Labels and Relationship Types you want to analyze.
Be careful with very big databases to analyze properties or label combinations, don’t do it on a database powering production workloads, rather on a backup or a read-replica/follower.
The following information is gathered when you analyze Properties and Label Combinations:
- Label Combinations
Label combinations will be found and counted.
- Label Properties
Label property combinations and counts. And a list of all the different properties found and their data types. It is also shows if a property has an index.
- Relationship Type Properties
Relationship Type Property Combinations and counts. And a list of all the relationship properties found and their data types
When the amount of nodes, or the amount of relationships is above a configurable threshold, sampling is used to limit the load on the server. Press the “Sampling” button to edit the threshold values.
Note that when sampling is used, the found properties and label combinations are an estimate.
To get an impression how the tool works I analyzed a database with
46M Nodes, 61M Relationships, 101 Labels, 124 Relationship Types and 18 Label Combinations (created with my faker-based dataset generator).
The default count, without property analysis and checking on Label combinations, took 2 seconds. The analysis with all Label/RelationshipType Properties and Label Combinations took ~15 minutes.
In this tab you can see all the details of a Label by clicking on the ‘Label’ row. That row also contains the count of the Nodes with this specific label.
In this tab a tile is shown per label combination with the count of it.
In this tab a bar is shown for every Relationship Type with the Relationship count. Only when Relationship properties are analyzed then the detail section will be shown when you click on the bar. In the details we see the property list and the possible property combinations.
Indexes, Constraints and Log tabs
For convenience the Indexes and Constraints of the database are listed here. The Log tab contains the logging of the analysis which is shown in the Summary tab during analyzing.
In this tool every 10 seconds (default) the nodes per selected label and relationships per selected relationship type are counted. Note that these queries are using the database statistics so these queries are very ‘light’ for the database. By default the first Label of the label list and the first Relationship Type of the Relationship Type list is selected. While counting you can add or remove labels or relationship types from the ‘count’. You will see these changes in the ‘next’ count. This tool counts structures in the database ‘Live’ however if you want to monitor the database you can use the Neo4j Desktop App Halin.
This tab makes it possible to ‘walk’ over your database model even when there are a lot of Labels and Relationship Types. The visualization only contains data, when the Database has been analyzed.
The database model starts with an empty canvas and you can start the exploration of the Model via selecting a Label via the “Labels Filter” or by pressing “Show All”. When the model complexity is too high, you will get a warning that showing the complete model will probably fail when clicking on “Show All”. In that case you can better use the “Labels Filter” to start your model exploration. The complexity of the model is calculated ad follows:
ModelComplexity = (Label Count + RelationshipType Count) * (Relationship Count / Node Count)When the ModelComplexity is above 400, then "Show All" will give a warning.
For smaller schema’s this option will the fastest way to get a quick overview of the database model.
When a Node is selected it becomes blue, and the properties of the Node will be shown on the Right. This will contain the Node Count of the Label and the Incoming and Outgoing Relationship Types with their Relationship counts. When the properties of this Label are analyzed you will see here also a property list with property types.
With the context menu on a selected “Label” Node you can add the incoming and outgoing relationship types to the visualisation including the connected “Label” Nodes. It is also possible to remove a Relationship Type or a “Label” Node from the canvas.
The source code for the Neo4j db Analyzer is on Github at kvegter/dbreportapp. You can read documentation there and report issues.