Typing the Neo4j Query API

Published in

Neo4j Developer Blog

8 min read1 day ago

This blog explores the story behind adding type information behind the Neo4j Query API. “What’s the Query API?” I hear you ask. It’s an easy and simple way to query Neo4j through your favorite HTTP client. The Query API is now available on Aura (in beta). Help is available on Discord in the Driver’s channel.

Neo4j’s First Interface

HTTP support for Neo4j was one of the first interfaces exposed to access Neo4j data (after the Embedded API). However, the HTTP interface has somewhat lagged behind the preferred transport Bolt protocol, which powers the official and community drivers.

Over the past few years, HTTP support has been slowly rejuvenated, with new functionality being added to the existing Cypher Transactional HTTP API such as limited support for clustering and HTTP/2 support.

The latest improvement is a bigger change. The API, as well as its type format, has been completely redesigned. It is now available as Neo4j Query API in parallel to the existing API. The Query API supports two formats:

application/json — The API's default, which maps Neo4j results directly onto JSON, choosing the most suitable JSON type. This format allows you to quickly query your data with minimal tooling.
application/vnd.neo4j.query— Defines a new mapping of Neo4j types onto JSON aimed at more complex interactions with results.

This post focuses on the new API’s typed result format and assumes a basic understanding of it. If you’d like to get familiar with the basics before reading this post, check out the post by Jonathan Giffard.

The Problem

All good solutions must start by correctly identifying the problem. In JSON, the de-facto standard for data exchange via HTTP, we have the following native types:

null
Boolean (true or false)
String ("A mighty string")
Number (1, 12.34)
Array (["a", 123, "bunch", true, "of", ["huh"], "stuff"])
Object ({"im": "an object"})

In Neo4j, we have a much richer type system. Neo4j has types that JSON also has, including Booleans, Stringsand Maps, but it also has things that JSON does not, such as Nodes, Dates and Points.

A Venn diagram showing JSON types (Numbers, Strings, Arrays etc) which are enclosed by Neo4j’s Types (Relationships, Durations, Paths etc).

Image note: This is a simplified view of how the types intersect. We omit some differences here for brevity.

The problem is that in order to interact with Neo4j via HTTP’s preferred type format, JSON, we have to introduce a mapping between the two type systems. Since the underlying types between the two systems diverge, and JSON is the format that is missing the additional types, we have to extend JSON so that Neo4j types can be communicated without losing the benefits of the additional Neo4j types.

Neo4j’s types mapped onto our extended JSON

Why Deal With Types at All?

Types also come with operations on these types. For example, if you know you are dealing with a Node type, then you also know that it has labels that can be retrieved from it (i.e., an operation of labels() -> [String]). Each of Neo4j's types comes with many useful operations, which makes further processing of the results possible.

But Didn’t You Say the Query API Supports Plain JSON?

Yes, but to do this we had to transform the richer Neo4j types directly onto the JSON types, losing their richness, and making writing a parser much more difficult. In the application/json format we serialize types such as points into the most suitable string representation (in this case Well-known text representation of geometry).

This format is best suited for simple use cases such as ad-hoc querying, rendering directly to a UI, or where it’s not possible to use additional tools or dependencies to further process the response. On the request side, we also would like to insert these Neo4j types into Neo4j and, while this is possible with Cypher functions, it would be much better to explicitly use these types in our requests.

So What Should the New Format Look Like?

Now that we have fully identified the problem, we can address what a good solution to it looks like. We set ourselves the following design goals for the new format:

Human readability — One of the great strengths of JSON is that it is understandable to read without any extra tooling (save a pretty formatter). This is great for ad-hoc querying since no further processing is needed. It also makes developing with it much easier since you can fire off a request and examine it before solidifying it in code.
Machine readability — As discussed, we want to preserve the richness of Neo4j types, and this means processing the results and deserializing them into the native (or custom) types of the calling language. We will expand on this in a later post, but for now, we understand this goal to be summarized as “Can I easily write a parser with standard tooling in mainstream programming languages?”
Streamability — There’s no reason to restrict the data size that the Query API can return. Since it might take a while to complete the returning of a large result, it would be great if the results were returned in a way that they can be processed by the client as they become available (i.e., to not wait for the whole thing comes back before processing it).
Flexibility — Even after over a decade and a half of development, Neo4j is still rapidly changing with new features such as Change Data Capture (CDC) and Retrieval-Augmented Generation (RAG) support. The most important thing for this API is that this format needs to be able to adapt to new types.

Performance

Performance is a consideration, but — as you can figure from these criteria — the main goal is ease of use. Neo4j’s performance offering is covered by our in-house binary Bolt protocol, which powers the official drivers.

Adding Types to JSON

A great starting point on adding type information to JSON can be found in Peter Hilton’s blog, but most options here essentially add a type property at various points in the JSON. The question is, where should we put it? We decided to use Option 3 of having type paired with value in a JSON object representing the results:

{
  "type":"Integer",
  "value":"1"
}

Let’s look at how this solution stacks up against our requirements:

Human readability — As Peter points out, some reduction in readability from the additional nesting of values (this may not look too bad in this example, but remember this would multiply for each nested value), it is still readable, and intuitively it is clear which is a type and which is a value. Mostly check.
Machine readability — A parser would only need to check the value of what is in the type field so this leads to a simple `if` block. But there's no need to completely write a parser from scratch. Many parsing libraries in many languages support this style of adding type information, making writing a parser quite easy. Check.
Streamability — Since types and values are nested nicely together, there’s nothing that prevents us from streaming values independently of each other. Check.
Flexibility — Adding a new type is as simple as introducing a new string for that type. Check.

The New `application/vnd.neo4j.query` Media Type

We introduce a type/value pairing object, which can be returned when adding application/vnd.neo4j.query as the Accept header:

curl --location 'http://<your_server>/db/neo4j/query/v2' \
--header 'Content-Type: application/vnd.neo4j.query' \
--header 'Accept: application/vnd.neo4j.query' \
--header 'Authorization: Basic <your_auth_token>' \
--data '{"statement": "RETURN 1"}'

{
  "data": {
    "fields":["1"], 
    "values":[{
      "$type":"Integer", 
      "_value":"1"}
    ]}, 
  "bookmarks":["..."]
}

Wait!? What’s with the $ and _? In general, it's good to make it clear when a JSON property is part of the result and when a property is considered metadata, and these two prefixes make it clear these properties are to be treated differently. We could exclude these, but Neo4j is also in the unfortunate position that we would have a clash of names with Neo4j's node type having a property called type. Best to avoid the confusion and potential mistakes.

And that’s all there is to it. Of course we now have to define each of the types’ string tags and the format of the value, but we leave this for our documentation.

Can I Push Data Into Neo4j Using This Format?

The problem we described earlier also applies when we want to push data into Neo4j. The lack of types available to us in JSON makes it impossible for Neo4j to understand what type you are trying to store. For example, take the following request:

curl --location 'http://<your_server>/db/neo4j/query/v2' \
--header 'Content-Type: application/json' \
--header 'Accept: application/vnd.neo4j.query' \
--header 'Authorization: Basic <your_token_here>' \
--data '{
    "statement": "RETURN $zonedDateTime",
    "parameters": {"zonedDateTime": "2015-11-21T21:40:32.142Z[Antarctica/Troll]"}
}'

{
  "data":
  {
    "fields":["$zonedDateTime"],
    "values":[
      {
        "$type":"String", 
        "_value":"2015-11-21T21:40:32.142Z[Antarctica/Troll]"
      }]
  },
  "bookmarks":["..."]
}

It’s impossible for Neo4j to know that you wanted a ZonedDateTime and not a String. It is possible to tag the Cypher statement with a dateTime() procedure so it will be read as the correct type, but then you miss out on the security and performance benefits of parameterizing your queries. Fortunately, you can still parameterize your query by using application/vnd.neo4j.query as input by setting the Content-Type header with that value:

curl --location 'http://localhost:7475/db/neo4j/query/v2' \
--header 'Content-Type: application/vnd.neo4j.query' \
--header 'Accept: application/vnd.neo4j.query' \
--header 'Authorization: Basic <your_token_here>' \
--data '{
    "statement": "RETURN $zonedDateTime",
    "parameters": {"zonedDateTime": {"$type": "ZonedDateTime", "_value":"2015-11-21T21:40:32.142Z[Antarctica/Troll]"}}
}'

{
  "data": 
  {
    "fields":["$zonedDateTime"],
    "values":[
      {
        "$type":"ZonedDateTime",
        "_value":"2015-11-21T21:40:32.142Z[Antarctica/Troll]"
      }]
  }, 
  "bookmarks":["..."]
}

Give application/vnd.neo4j.query a try today on Aura, and be sure to reach out with questions or feedback on Discord.

Acknowledgments

Many folks helped make the Neo4j Query API a reality. My fellow colleagues Michael Simons and Gerrit Meier for the initial PoC that kicked off this whole thing and their continued support throughout the project. Dmitriy Tverdiakov, who helped out with a key piece of architecture. Grant Lodge, who has reviewed many of my nonsensical PRs. Antonio Barcelos, who helped to validate this type format by writing the first non-Java parser in Javascript. A big thanks to Waiariki Koia and the folks in Aura who are helping to bring this show onto the Aura stage. Stefano Ottolenghi for the great work putting the docs together. Andy Heap and Ivan Fulöp for their patience and support during some sticky moments. As mentioned already, the project’s Jonathan Giffard driving this whole thing forward. And last but not least, Richard Macaskill, who was previously on this project before he sadly passed away last year.