Meet the Query Log Analyzer

Analyzing Neo4j Query Log files on your Neo4j Desktop

The Query Log Analyzer is a Neo4j Desktop App to help you to understand the query log file of a Neo4j Enterprise server.

When you experience unexpected slowness of Neo4j, your queries may be inefficient or the query load on the server is too high. A good step is then to enable the query log via the neo4j.conf file.

dbms.logs.query.enabled=true
# If the execution of query takes more time than this threshold,
# the query is logged. If set to zero then all queries
dbms.logs.query.threshold=100ms
dbms.logs.query.parameter_logging_enabled=true
dbms.logs.query.time_logging_enabled=true
dbms.logs.query.allocation_logging_enabled=true
dbms.logs.query.page_logging_enabled=true
dbms.track_query_cpu_time=true
dbms.track_query_allocation=true

Normally you will set a threshold to log only those queries which take more than an x amount of time (start with 100ms for example, or zero for all queries). This means that the queries shown in the query log tool are not the complete query load on the server! This tool however can give you a direction to find the possible causes for your query bottlenecks quickly.

I’ll explain that in more detail in my follow-up article.

It is good practice to switch the query logging on for development and test servers and analyse your queries frequently when you develop your solution.

Install the Query Log Analyzer as follows in Neo4j Desktop 1.1.10+:

Open the “Graph Applications”-sidebar, and paste the url: 
https://neo.jfrog.io/neo/api/npm/npm/query-log-analyzer 
into the “Install Graph Application” field and press “Install”

Select a Project and press ‘+ Add Application’ in the applications list
Choose here the “Query Log Analyzer” to add it to your Project .

In the following I explain the Query Log Analyzer App.


Query Log Analyzer

The Query Log Analyzer needs a query.log file. You can upload this file to the tool and then the tool will analyse this file. After analysing the file the following message will be shown:

In this example the query log file has 17341 rows (each query one row) and 249 distinct queries are found. These 249 distinct queries are shown in the “Query Analysis” tab where you can find per query the statistics.

Query Analysis

In the Query Analysis tab you will see the distinct queries ordered by Query Count * Avg Time descending. Which means that the most expensive query from the log file is placed on top.

The following fields and functions are available in the Query Analysis tab:

  • The Query (the cell below AvgTime — Avg Mem values)
    This is the actual ‘distinct’ query string.
  • Query Count
    The count of this distinct query in the log file and access to the following functions for this query:
    Filter
    Show only the query log records for this query in the Query Log tab.
    Highlight
    Highlight this query in the query log records in the Query Log tab. It can be useful to see which queries are send to the server around the same time.
    Timeline
    Experimental! Show the occurrences of this query in the Query Timeline tab.
  • Avg Time, Min Tim, Max Time
    The Time is here the total time the query uses to execute (query cpu + planning + waiting).
  • Avg CPU
    This is actual query execution time on the CPU. When detailed time logging is disabled a 0 will be shown here. 
    requires: dbms.logs.query.time_logging_enabled=true and dbms.track_query_cpu_time=true
  • Max Planning
    This is the maximum time spend in the query plannings phase. When you hover over the value you will see also the Min and Avg planning times. Normally the first time a query is fired the query is planned and the query execution plan is placed in the query cache. So the next time a query is executed the Planning time will be almost 0. Only when the plan is evicted from the cache, the next query has to be compiled again.
    When Time logging is disabled a 0 will be shown here.
    requires: dbms.logs.query.time_logging_enabled=true
  • Avg Waiting
    The average waiting time before executing the query. The wait can be caused by heavy loads, so the query has to wait to get some execution time or the wait can be caused by waiting for database locks to be released. When Time logging is disabled a 0 will be shown here.
    requires: dbms.logs.query.time_logging_enabled=true
  • Cache Hits %
    This gives the percentage of the data for this query which could be read from cache. 100% means that all the data is read from cache.
    requires: dbms.logs.query.page_logging_enabled=true
  • Avg Mem
    This is the average allocated bytes in memory of this query. Note that this is a cumulative value and tells something about how memory intensive the query was.
    requires: dbms.logs.query.allocation_logging_enabled=true and dbms.track_query_allocation=true
  • Protocol + Clients
    With Protocol you can see in which context the query is fired from. Values can be:
    “bolt”
    This is any bolt client connecting to the database.
    “http”
    This is any http client using the Neo4j rest interface (applicable to older Neo4j versions)
    ”embedded”
    This is a cypher call from database logic like procedures and functions.
    Also a client list is shown, this may be useful to identify how many different ip numbers are sending requests to the Neo4j server. Note that the bolt driver keeps a pool of connections open to the database, so you can have many clients from one ip number.

Query Log

The Query Log tab shows every query log row with proper headings. There is a lot of information in there so you need to scroll horizontally to see all the columns. From the first Query Analysis tab you can click on Highlight, then the selected query is highlighted. When you press Filter from the first tab only those query log records are shown in this tab. When you want to profile a query than you can copy the query and the used query parameters from this tab.

Query Timeline

The Query Timeline is an experimental feature and it plots the amount of queries per time unit (default is 5 minutes) and the average query time in seconds. This is based on the log time which is not the same as the query start time. It will give you a quick overview when it was very busy on the server.


Links

The source code for the Query Log Analyzer is on Github at kvegter/query-analyzer-app. You can read documentation there and report issues.

If you have questions regarding the query performance, you can always head to the #help-cypher channel on the Neo4j Users Slack or on the neo4j community.

In the next post I am looking at optimizing your Cypher query performance using this tool.