Where’s My Neo4j Cypher Query Results? 😠 ⚡️ ⁉️

Slow Cypher Queries in Neo4j Browser? Why a Cypher query run in the Neo4j Browser may not return in a reasonable amount of time, what is happening, and what you can do about it.

Dan Flavin
Neo4j Developer Blog
13 min readJan 14, 2021

--

Note: T̶h̶e̶ ̶c̶o̶n̶t̶e̶n̶t̶ ̶o̶f̶ ̶t̶h̶i̶s̶ ̶p̶o̶s̶t̶ ̶i̶s̶ ̶r̶e̶l̶e̶v̶a̶n̶t̶ ̶t̶o̶ ̶t̶h̶e̶ ̶N̶e̶o̶4̶j̶ ̶B̶r̶o̶w̶s̶e̶r̶ ̶v̶e̶r̶s̶i̶o̶n̶ ̶4̶.̶2̶.̶0̶ ̶a̶s̶ ̶o̶f̶ ̶J̶a̶n̶u̶a̶r̶y̶,̶ ̶2̶0̶2̶1̶. Your “mileage may vary” as the Neo4j Browser is continually being improved. The Neo4j Browser version 4.4.3 has been improved! The cypher statement that returned results fast from the database, but used to overwhelm the browser display now displays the results.

Photo by JESHOOTS.COM on Unsplash

TL;DR

The Neo4j Browser is a javascript-based web browser application. This runtime environment can be the limiting factor in being able to process and render the force-directed layout of a Cypher query output. Often users will assume that the query is slow when it could very well be the processing behind browser rendering results. This post covers how to address this operational issue.

The Neo4j Browser is a developer tool and is the number one interface used to write Cypher queries to interact with a Neo4j database.

The browser delivers an easy-to-use, metadata-supported environment for developing Cypher queries with the interactive, force-directed graph visualization being the most popular output.

The utility of the Neo4j Browser for developers cannot be understated and it will continue to be the main development interface for Neo4j users.

It has query history, saving favorites, multi-statement execution, a variety of output formats, browser guides available through the :play command, and is continuously being improved (hint: check out the :edit command if you haven’t already). You can check out the user interface guide here.

The majority of Cypher queries run in the Neo4j Browser have no issues displaying an interactive graph visualization. Unfortunately, there are times when queries run through the Neo4j Browser start “taking a long time to run” and frustration builds while looking at the spinning dots as shown in figure 2.

Figure 2. The “Spinning dots”. Where’s me lines, arrows, and bouncing bubbles?

This situation is where performance and utility can unknowingly be at cross purposes.

Neo4j users often equate the time it takes for the graph visualization or rendering of results of a Cypher query in the Neo4j Browser to database query performance.

At first glance, this is a perfectly reasonable assumption. The reality is that it can be the rendering of the query results that is taking time, not the query execution.

Assuming that the query response time is measured by when the visualization is produced can result in potentially invalid and costly conclusions, such as “I must re-write my query and / or restructure my database”.

The goal of this post is to provide Neo4j Browser users with ways to determine if it’s a query’s performance or if the visualization that is the culprit when query execution seems slow, and what other options exist.

The workflow is very simple:

  1. Determine if it’s the Neo4j Browser or the query execution that is keeping a visualization from being displayed.
  2. If Neo4j Browser visualization is the bottleneck, then try different query techniques and/or a different tool.
  3. Investigate why the query is slow to return if it’s not the visualization.

What the post does not take into consideration is the larger context of concurrency, resource and database utilization, or query design. That’s a much bigger topic!

The Neo4j Browser and What It’s Doing

50 Nodes -> 70 Relationship: “Arrgh! Query!!! 🙀” vs. “Yeah! Query!!! 😺”

What Is the Neo4j Browser Graph Visualization Really Displaying?

Let’s use a simple acyclic graph structure to illustrate how even a small set of data can result in a significant amount of processing needed for visualization. The example graph consists of nodes with a :Node label, that are related to each other by a :PARENT_OF relationship, each with a unique node_id field.

Figure 3. Example database schema

The graph is intentionally small, containing only 50 :Node labeled nodes, and 70 :PARENT_OF relationships. It was generated using a Neo4j apoc procedure that generates a random graph using the Erdos-Renyi model. The schema visualization is shown in figure 3, and the source Cypher statements are on github.

Why so small? To illustrate how even a small number of nodes and relationships can have a large number of unique paths through the graph.

The number of unique paths is not a problem unto itself but can have an impact on the Neo4j Browser graph visualization that is not obvious.

Neo4j Browser can take a long time to render information, because what it draws and what the query results are can be different.

Figure 4 shows the visual output of the one and two-hop :PARENT_OF relationship traversals between :Node nodes using the query MATCH paths=(:Node)-[:PARENT_OF*1..2]->() RETURN paths. Here’s where the flattened, interactive visual output of the Neo4j Browser can be misinterpreted.

Figure 4. One and two traversal Neo4j Browser interactive graph visualization

Figure 4 shows 47 nodes and 70 relationships displayed, but running a path count query MATCH paths=(:Node)-[:PARENT_OF*1..2]->() RETURN count(path) returns 170 paths.

Why the Discrepancy Between What’s Visualized and the Underlying Dataset It Represents?

The Neo4j Browser transforms the multi-dimensional overlapping path data into a 2D visual. The visualization is very useful for exploring query results with the unfortunate potential side effect of obscuring the actual amount of data returned. (see “The Graph Database Chronicles Episode 1” for a deeper discussion if this seems strange to you).

The 50 node / 70 relationship example graph has a total of 5,673 unique paths. A visualization of all the unique paths is not much different from figure 4, but the underlying data is approximately a 3,000% increase over the 170 one and two-hop traversal query paths.

The query MATCH path=(:Node)-[:PARENT_OF*]->() RETURN path illustrates returning all the paths for any number of traversals. Running this in the Neo4j Browser would result in the figure 2 spinning dots. Not to worry!

Turns out that the query is not the issue.

Think about your query carefully. Do you want a picture of that data? Are you sure that rendering that visually is a useful thing to do?
Neo4j Browser’s greatest strength (easy visualization) is also it’s greatest weakness. It simply isn’t the case that visualizing results is the first thing you should do always.

This could be your inner monologue:

⁉️ This is a good time for a reality check. It should be asked what the query is trying to accomplish and will the visualization add any value, or is it just eye candy? 👀
For this example it would be hard to come up with valid reason for needing a flattened visualization of 5,673 unique paths. “But wait! I am looking for unrecognized patterns in my graph!” is a common quick response. A flattened force-directed layout is not going to give you this. What’s really being asked for is the ability to find patterns based on the shape of the data in the graph. This would be a good indication that you might get value from Neo4j Graph Data Science algorithms.

Three Simple Ways to Determine If the Neo4j Browser Graph Visualization Rendering is Masquerading As a Performance Issue

1The count() test.

Using the Cypher count() aggregation can be an easy test to understand how much data a query is returning and to get an idea of the query performance characteristics. Running a Cypher query

MATCH path=(:Node)-[:PARENT_OF*]->() RETURN count(path)

to count the number of paths rows in figure 5:

Figure 5. Path count query

Notice two things:

  1. The number of paths can be significant. In my case, there are 5,673 unique paths through the database even though there are only 50 nodes and 70 relationships in the graph.
  2. The query starts streaming results in 1ms and completes after 10ms.

The aggregation avoids having to return and render the interactive graph visualization. There’s no fun force-directed layout visual (which is of dubious value in this use case), but at least it is known that it’s not the query execution itself that’s the performance issue here.

The 5,673 unique paths in my example graph represents a combination of all paths. This is often not what a user wants and is an example of how the graph visualization can obscure the underlying data.

This post is focusing on when intentionally or not, a query returns more data than is reasonable for the Neo4j Browser to process. Why there are 5,673 paths returned and why this may not be obvious is a topic for another discussion.

2Use the PROFILE query directive to avoid the default force-directed visual layout.

Using the PROFILE directive will show how the query was executed, the processing time, and show the query execution steps as the initial output, avoiding the default graph visualization step (until you click on the visualization tab).

To see this in action, run the all paths query with the PROFILE command:

PROFILE MATCH paths=(:Node)-[:PARENT_OF*]->() RETURN paths

which shows the query executing in 11ms as shown in figure 6 below.

If you then click on the graph visualization icon as shown in figure 7, you’ll likely end up waiting for the visualization to render as in figure 2. The Neo4j Browser might have to be closed and reopened to continue on because your web browser’s JavaScript engine ran out of memory.

Figure 6 (left). Abbreviated query PROFILE / Figure 7 (right). Switch to graph visualization

Consider using the approaches for working with the data in the following “Options” section if the PROFILE command returns in a reasonable amount of time but takes too long to visualize as a graph. If PROFILE does not return in a reasonable amount of time, then it’s likely the query execution that is the culprit.

Try running the query with EXPLAIN directive to see the expected query execution plan. Either optimize the query structure from there, or consider steps similar to those discussed in the “It is the query” section below.

3 Use a LIMIT clause in your query, or reduce the number of paths traversed.

This approach is useful for minimizing the results returned, allowing for the graph visualization to be displayed in the Neo4j Browser. This is not always appropriate as it changes the query.

A LIMIT is simply added to the query:

MATCH path=(:Node)-[:PARENT_OF*]->() RETURN path LIMIT 200

While changing the path length from 1 to 3 hops is accomplished by:

MATCH path=(:Node)-[:PARENT_OF*1..3]->() RETURN path

This obviously is a trial-and-error approach that sometimes elicits interesting observations that can change the premise of the original query.

Of course, you can combine both to minimize even more.

Options: Neo4j Browser Visualization Is the “Performance” Culprit

What Are My Options If I Want to Work with the Big Old Dataset Returned by the Cypher Query?

1Use cypher-shell

Simple to use command-line utility for running Cypher queries, cypher-shell works with all Neo4j editions.

cypher-shell query results have rudimentary formatting that needs very little processing to create the final output. I often use it from within the Sublime Text Editor when developing Cypher queries¹. This is useful when I have a series of disjoint statements (e.g. create data, indexes, match, merge, etc.) to run in sequence, or I want to use git as a repository for my queries.

You can find cypher-shell in the Neo4j install location bin subdirectory, or it can be installed standalone (see Cypher Shell section of the Neo4j Downloads page).
Running cypher-shell in a terminal window launched from the Neo4j Desktop on a Macbook is in figure 8. It is the same process for Windows.

Figure 8. Launching a terminal window and running cypher-shell on a mac

👉 Using cypher-shell with the--format plain option is one of the fastest ways to return query data and execution metrics without writing your own code. Output can be saved to a file or piped through a pager for a better user experience.

2 Use Neo4j Bloom Neo4j Bloom is a graph visualization tool designed for end-users and analysts. Bloom users navigate and query the graph without having to write Cypher while being able to visualize a much larger set of data than is possible in the Neo4j Browser.

Update! See the “Run a Cypher Statement in Bloom” section of the “How to Create Conditional and Dynamic Queries in Neo4j Bloom” blog post for an easier and reusable way to run raw Cypher in Neo4j Bloom.

Figure 9. Bloom create search phrase dialog box

How does that help in this scenario where we want to see the output of a Cypher statement? Bloom allows for Cypher queries to be created and parameterized for use within the interface by creating a custom search phrase (figure 9). We can then execute the query that was problematic for the Neo4j Browser to see the visualization Bloom.

A caveat with Neo4j Bloom. Neo4j Bloom requires the Enterprise version of the Neo4j database and does not work with the Neo4j Community Edition. Neo4j Bloom also works best with a dedicated GPU. Users can use Neo4j Bloom in several ways:

  1. Using the Neo4j Desktop. A single-user Bloom installation is included with the Neo4j Desktop. Databases created from within the Neo4j Desktop run a free Developer License of the Neo4j Enterprise Edition.
  2. Through Neo4j Aura, all Neo4j Aura instances come with Bloom enabled out of the box. There’s even a free version of Aura that you can try.
  3. Using a Neo4j Sandbox, which is a free, guided graph database use case walkthroughs that can be accessed for up to 10 days.
  4. Install with a Neo4j Enterprise Edition database. This requires Neo4j Bloom to be installed on the server and a Bloom activation key.

It Is the Query After all! What to Do If the Cypher Query Is Not Performing as Wanted?

✏️ That Is a Big Subject! ☯

A single blog post could never even begin to address the subject of the yin and yang of query and database performance. There’s just too much to cover and too many variables. Given that, here are some hints and tips to keep in mind and resources to help out:

  • Check your schema
  • Monitor your database
  • Get help from the community
  • The YANGI principle
  • Do you have a graph problem?

Check Your Schema

Like any database, the design and how well queries are written can affect performance. The good news is that being “schema-less”, it is very easy to provide multiple graph models in a single Neo4j database to meet different query requirements.

The twist is understanding when and how to create an efficient graph model and queries. Fortunately, there’s an incredible number of resources to help Neo4j developers and users. There’s a quick introduction to modeling in the Neo4j Developer documentation, and the no-charge online courses from Neo4j Graph Academy. These courses cover Cypher basics, from advanced query writing and optimization to database design and administration, etc.

Monitoring

Remember that the Neo4j graph database is a database. Even though Neo4j is a very efficient graph database, the universal database resource trifecta of RAM, CPU, and I/O still apply and are constrained by concurrent usage.

You can’t fix what you can’t see, and there are many ways to monitor the resource usage of the Neo4j database and queries.

👉 Note: Halin is being depreciated and will not support certain features for the Neo4j Graph Database version greater than 4.3.0.

The Neo4j Operations Manual covers production monitoring of database and system metrics. This includes sending metrics to Graphite, and publishing metrics for polling as a Prometheus endpoint.

Get Help from the Community

The Neo4j Community website is a wonderful resource to ask specific questions and take advantage of the collective knowledge of the vast Neo4j user community. I will often go to neo4j.community.com for ideas when I’m trying to solve a problem, or am looking for new approaches to writing a complex Cypher pattern. Good chance that whatever it is you’re asking has already been addressed. On a side note, the Neo4j Community’s “This Week in Neo4j” and “Featured Community Member” often present interesting user-provided graph use case examples and real-world projects using new approaches and technologies. I would have completely missed the “Using Neo4j withPySpark on Databricks” post if I wasn’t a member of the Neo4j community.

The YAGNI Principle

Follow the YagNI (You ain’t gonna Need It) principle and good graph data modeling techniques.

A generalized query will return all the properties for each node and relationship to Neo4j Browser for rendering. If every node in our example graph had 512K of property data, that would be ~3MB of data being returned to the Neo4j Browser for displayed in the property value box.

That’s a lot of memory and CPU being used just in case a user clicks or hovers on an individual node or relationship visualization to see the property data.

Having node properties available in visualization is what you’d expect, but there’s only so much you can really load into our web-based Neo4j Browser.

Not only is an unreasonable number of properties a stressor for the Neo4j Browser, but it can indicate an underdeveloped graph model. See this series of blog posts for a quick introduction to modeling concepts and how a good model that is easier to understand and the query will avoid this scenario.

Do You Have a Graph Problem?

This should be obvious, but be sure you have a graph use case! It is so much fun and easy working with a Neo4j graph database that it is easy to try and apply it to scenarios where a graph does not add any value. Watch this short video for a good introduction to identifying graph shaped problems

Parting Thoughts. Yours and Mine.

Thank you for your time if you made it this far. Please post any questions or comments as I am very interested in what readers think and am hoping to gain insight from any responses.

¹ More on the Sublime Text editor and cypher-shell coming in another blog post.

--

--

Dan Flavin
Neo4j Developer Blog

A Neo4j Field Engineer working with graph technology and concepts after becoming a true graph advocate by surviving an untold number of database technologies.