Image for post
Image for post
Matryoshkas by Iza Gawrych on Unsplash

The Power of Subqueries in Neo4j 4.x

Michael Hunger
Jul 16, 2020 · 8 min read

Nest your Cypher statements with subqueries!

Introduction

Neo4j 4.1 came out in June with the final piece required to make subqueries work well in Neo4j — So I wanted to use the opportunity for a deep dive into the topic.

For those of you who prefer to watch than read, I did a Twitch session on this topic last week. We have regular schedule of live streaming, do check out the calendar. Follow us there or YouTube to be notified when we go live!

I have to start with a disclaimer: In all the years writing SQL with a variety of databases, I have always disliked subqueries. I felt that they made queries harder to read, understand, and to reason about.

Update

My colleague Andrew Bowman created a series of very helpful knowledge base articles for this topic.

History

That’s why I always vetoed subqueries in Neo4j’s Cypher, and rather, pushed for a query pipeline that passes data from one query part to the next. Much like a Unix shell pipe that connects small tools, each one that does one job well with each other, like Lego blocks to form a more complex process.

In Cypher you achieve this with the WITH statement, allowing you to chain query parts together. WITH has the same semantics as the final RETURN, you can select columns, rename, aggregate, filter, paginate and sort your data. And all that WITHIN each query multiple times.

That is also the reason we never needed a HAVING keyword in Cypher, as it is just glorified post-filter after your result selection. You can have as many of those intermediate result-processing-passing steps in your Cypher query as you want.

Here is a quick example. Get me the cast of the movies with the highest ratings in our dataset.

That’s also the approach you can use for handling many of the “traditional” subqueries. Built-in list processing, and an exists( (n)-[:REL]->(m) ) expression for patterns manages other use-cases.

Subqueries

However, subqueries support two use-cases that were not so easy to handle before.

One was a long-time requested feature called post-union-processing. A UNION (ALL) construct combines multiple independent queries into one result.

But UNION itself is not a full query, i.e. you can’t do anything else except combine those queries, and have their (identically named) columns returned to the caller in any order. So in the past, you had to do any post-processing client-side. (or with apoc.cypher.run()).

Neo4j 4.0 introduced subqueries that addressed this issue. With the new CALL {} syntax, you can wrap a statement part as a subquery, and its result and cardinality will be available in the subsequent parts.

This calls the subquery which returns 42 as an answer, which is then available in the outer statement, for each question found.

To apply this to the UNION in question, you would wrap that statement as a subquery, and then can filter, sort, paginate (or do whatever) with the results.

NOTE
Please note that the subquery functionality in Neo4j 4.0 was not able to access variables defined outside of their scope (non-correlated) . That has only changed in 4.1, see below.

My colleague Mark wrote a great developer guide on subqueries and Luanne Misquitta from our partner GraphAware dove into this in a full blog post on post-union-processing.

Existential Subqueries

Existential subqueries are a special construct, a partial subquery to test the existence of certain graph patterns. Before, you could express patterns with the existsfunction/predicate.

Now with existential subqueries the syntax is one of these three:

  • WHERE exists { (node)-[:REL]->…​(:Label) }, which is a shorthand for
  • WHERE exists { MATCH (node)-[:REL]->…​(:Label) }, which can be extended with WHERE
  • WHERE exists { MATCH (node)-[:REL]->…​(:Label) WHERE expression …​ }

It cannot contain RETURN, WITH, aggregations or other operations.

NOTE
Currently the existential subquery cannot be used as an expression, I think that’s a bug and will be fixed.

Navigating along sub-trees

When fetching data that contained many disparate sub-trees, the WITH and aggregation approach became a bit cumbersome, as your query was not just a pipe for processing, but a tree of nested sub-parts.

For example, if you want to get the co-actors of an actor in their shared movies, but also at the same time, the directors and their movies in a single query, you’re navigating two very distinct subtrees of your data. This was possible with multiple (OPTIONAL) MATCH expressions and incremental aggregation of map data structures, but not trivial.

In Neo4j 3.1 Andres Taylor, the father of Cypher, sneaked in two really cool features that were inspired by GraphQL and made this kind of querying so much easier.

First there are pattern comprehensions. Those are like list comprehensions, but allow an expression to use a graph pattern (with new identifiers) to be filtered, and then have an expression applied to each element [ (pattern…​) WHERE filter | expression].

These two return equivalent results:

and

Which is very cool as it doesn’t change the cardinality of your query like aMATCH would, counteracting with an aggregation like collect or count.

And then map projections. These can take a map-like element (map, node, relationship), and extract attributes in a concise syntax into a map again:
elem {.foo, .bar, .* , answer: 42}.

So you can sub-select the parts you are interested in, and then with the regular key:expression syntax you can start nesting the two together.

So our second “tree” can be expressed as:

These features are especially useful if you want to regularly query nested structures with defined sub-tree selections, like in any object-graph-mapping tool, such as our GraphQL integration or Spring Data Neo4j.

So I wrote up this concept as a dedicated blog post a while ago:

The drawback is that those features are not optimized well by the query planner, and also don’t offer support for sorting and pagination. You can paginate by just applying a slice [0..10] after the fact, but that is only applied after the full comprehension has been computed. Also, sorting has to be simulated with a user defined function in apoc apoc.coll.sortMaps().

Full Subqueries

This is where full (or correlated) subqueries come in, which were added in Neo4j 4.1. These subqueries can now also access identifiers/variables from the outer scope, but those need to be declared explicitly at the beginning of the subquery (i.e. using WITH a,b,c).

I personally think that syntax is not optimal — it would have been nicer to align subquery and procedure call syntax, and treat them like parameters. Such a subquery can contain a full Cypher statement with all clauses and operations.

Here is our full example:

WARNING

The cardinality of a subquery can affect the outer query. If it doesn’t return any rows, the outer query will not return any rows. The opposite is also true, if your subquery returns multiple rows, then that will multiply the cardinality of your outer query. That’s why you should either use OPTIONAL MATCH in your subquery and/or a pure (single) aggregation with collect, so you always get one row from a subquery (except if you want more).

You cannot shadow existing identifiers. If you want to return something that was passed in, you have to rename it. Identifiers returned from a subquery are named as is. If you have an expression, the name of the identifier will be the same, e.g. n.value + 5, so make sure to always alias them properly, even if it’s not enforced (which is, in my humble opinion, an oversight, in WITH we enforced it back then).

Subqueries can be nested, and are planned better (like regular query parts) by the query analyzer. Within subqueries, you can now also use sorting and pagination, so our workarounds for GraphQL and Spring Data Neo4j will not be necessary any longer, starting with this version.

can become:

Personally, I still like the pattern comprehension syntax much more. Perhaps at some point it can be extended with ORDER BY and LIMIT semantics as syntactic sugar for subqueries.

Updates

You can also do updates in a subquery. Remember that the subquery is executed each time for an outer row (with the same cardinality), so watch out for that (e.g. use MERGE instead of CREATE).

If you want to use subqueries for conditional updates, you should always return an aggregation, to make sure to not stop the outer query.

Subqueries in Neo4j Fabric

Finally, subqueries are also used in Neo4j’s sharding and federation approach — “Neo4j Fabric” — for delineating query parts that are meant to execute on different databases.

Here you can use subqueries together with the USE keyword to direct a query part to a certain database.

For example (from the Fabric documentation):

This uses a fabric database named movie with two mounted shards/databases called moviesUSA and moviesEU.

References

Happy querying!

Neo4j Developer Blog

Developer Content around Graph Databases, Neo4j, Cypher…

Michael Hunger

Written by

A software developer passionate about teaching and learning. Currently working with Neo4j, GraphQL, Kotlin, ML/AI, Micronaut, Spring, Kafka, and more.

Neo4j Developer Blog

Developer Content around Graph Databases, Neo4j, Cypher, Data Science, Graph Analytics, GraphQL and more.

Michael Hunger

Written by

A software developer passionate about teaching and learning. Currently working with Neo4j, GraphQL, Kotlin, ML/AI, Micronaut, Spring, Kafka, and more.

Neo4j Developer Blog

Developer Content around Graph Databases, Neo4j, Cypher, Data Science, Graph Analytics, GraphQL and more.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store