Using graphs and neo4j to build a search tool for Dutch law

Felix van Litsenburg
7 min readDec 30, 2021

This article explains how Wetzoek, a neural search engine for Dutch law, uses the power of graphs to enhance search results. Dutch open legal data provides the connections between laws and cases. These connections can be captured in a graph and analyzed using community-detection algorithms. These communities serve as the labels for a multi-label classification model, directing users to the legal question they are most likely facing.


Wetzoek is a free-to-use neural search engine based on Open Data from the Dutch legal system. It has two main features, aimed at different user groups:

  1. A neural search engine which uses Transformers-based models and Haystack technology to provide more accurate search results than traditional methods. This feature supports research by legal professionals.
  2. A multi-label classification model which assigns each query to a specific legal topic. This feature helps laypeople understand what kind of legal challenge they might be facing, for example constructive dismissal.

This article will focus on number 2, the multi-label classification model. To build a multi-label classification model, we needed the text of laws and cases, and of course labels for each law and each case.

The Open Data provided by the Dutch legal system comes with labels for most cases and laws, assigning them to broad categories of law from “Criminal law” to “Employment law”. These categories were too broad for our purpose: a layperson does not get much help from being told he faces a “Criminal law” case. Our first challenge, therefore, was to find a way to create more granular labels that directly corresponded to real-world legal topics.

Enter graph theory

Because the Open Data provided by the Dutch government’s LIDO database maps the connections between cases and laws, it can be captured in a (directed) graph. If you do not know what a graph is yet, the below screenshot is worth a thousand words:

A screenshot of a directed graph of connections between cases and laws
A screenshot of a directed graph of connections between cases and laws. Cases in green, law in yellow

This screenshot was taken on the Neo4j browser, one of the tools used to construct this Graph. I also used the Python libraries NetworkX and GraphTool. It shows a number of cases (in green, with the starting text of their summaries) and references to other cases and laws.

Graphs are a mathematical structure with two core components: nodes (the bubbles in the image) and edges (the arrows in the image). They have a myriad of applications because one can assign properties to the nodes and edges. For example, most mapping software is built on graph theory.

The screenshot above is an example of a community: a number of tightly interconnected nodes. Several algorithms exist to break down one large graph into a number of smaller, tightly connected communities. Without this, it would be impossible to see the forest for the trees in a large dataset. And the dataset we work with is large, with over a million nodes! Below is an image of what a larger part of the dataset looks like.

A screenshot of a larger cut of the database, showing how chaotic it can be
This screenshot shows a fraction of the dataset, without applying any community detection algorithms

As the above two screenshots show, community detection is crucial to entangle the web of references between laws and cases. By breaking this web down into small clusters of ten to twelve cases and a few laws, we get communities that are a) meaningful, because they point to a specific legal question; b) small enough that their meaning can be understood. That b) holds is clear — most communities in the database consisted of less than around 20 nodes*. How, however, can we be sure that the communities in fact represent something meaningful?

The meaning of community

Two ways of understanding if the identified communities were meaningful presented themselves: looking at past research, and investigating the communities themselves. For this project, I did both and got good, but not perfect results.

Firstly, identifying communities within this specific database has been done before! In the article Purposes and challenges of legal network analysis on case law, by Dafne van Kuppevelt, Gijs van Dijck and Marcel Schaper, the authors explore Legal Network Analysis on Dutch case law.

The authors take recognised sub-topics within several legal areas (employer liability, and EU-wide taxation decisions) and compare these to the communities identified by different algorithms. The Louvain algorithm does the best job at identifying relevant topics, showing it is quite accurate (around 60–75%) at correctly identifying topics.

When I applied the Louvain algorithm, I found the communities quite naturally centered around legal sub-topics. For instance, one set of cases all revolved around benefits being stopped because someone who claimed to be single, was in fact living together with someone else. Many of the communities focused on such sub-topics.

However, some of the communities were a bit confused. In particular, many cases will refer to items of procedural law: when can a local government decision be challenged? When can it be appealed? In general, this is not as interesting as substantive law: what do we define as negligent under certain circumstances? What constitutes anti-competitive behaviour?

By excluding nodes relating to procedural law, we can focus the communities on substantive law. There is a trade-off here, as precedent related to procedural questions can be relevant; however, in many cases the procedural law nodes will siphon away cases from a myriad of topics, but not focus on a particular question within procedural law themselves.

Where has this left us, in concrete terms? The entire database of Dutch law we built came down to ~1.1mn nodes, both laws and cases. With the Louvain algorithm, we then get down to ~12,000 communities with an average of 3 nodes each. However, more than half of these communities where single-node communities, and therefore not meaningful for our analysis. Ruling these out, we had just over 4,000 communities with an average of 7 nodes per community. For training a model, we selected only ~80 communities that had over 100 nodes, which also left enough communities to manually investigate the topic of.

In conclusion: graphs practically applied to machine learning

Where does this leave us? First, I should stress: this is not Machine Learning with Graphs or Graph Neural Networks. Rather, we have exploited a graph representation of our data to generate informative labels for a multi-label classification model.

The usefulness of this approach will depend ultimately on the outcome of the multi-label classification model. In a sense, this means that the graph analysis will enter as a hyperparameter. More strictly speaking, however, it significantly affects the data we pour into the model.

Its usefulness will ultimately be decided by the tool users: do their queries direct them to legal topics that are directly relevant? If not, even if the community detection worked flawlessly, the tool itself will have failed. Nonetheless, developing a reliable method for identifying legal sub-topics will be an interesting outcome.

Applied to other contexts

Where else could this strategy be relevant? We’d be thinking of any text-heavy area, where different documents point to each other, and where their classification on organic topics is useful. Academic paper citation is often used both in NLP and Graph Theory contexts, and could be an interesting use case.

It might be applicable to crowdsourcing platforms such as Stackoverflow, where related questions can be clustered together into sub-topics allowing for easier user navigation, and immediate classification of new questions.

Avenues to explore

Currently, we used the Louvain algorithm to determine the labels for our text data, provided there were at least 100 nodes in a community. Some approaches that could affect results:

  • Changing the cut-off for the number of nodes
  • Running a “layered” or “hierarchical” Louvain and also running a layered multi-label classification model
  • Running parallel community detection: splitting the database up into areas of law and running the Louvain algorithm within each of these
  • Keeping certain laws out of the algorithm: procedural laws, that describe for example when an appeal can be launched, can be ignored to identify more content-driven communities
  • Looking at the different passages of cases and their specific references to laws and other cases (hard)

Parting thoughts

Some thoughts on Neo4j versus Python libraries for graph analysis:

Pros for Neo4j:

  • You have a database to work with, which can be updated quite easily (in this specific usecase, you could set a cronjob to fetch the data)
  • Performing the Louvain algorithm takes a few minutes at most
  • One of the most-widely used

Cons for Neo4j:

  • You’ll have to learn Cypher, on top of the language you are already using
  • It’s continuously in development and documentation can be lacking and- occasionally — the documentation will lag behind the latest implementations
  • They want to make money, and running a Neo4j server might prove costly in the long run
  • Loading data into the database takes a long time (or at least it took me a long time)

Pros for Python libraries, in particular NetworkX:

  • Plug-and-play with Python
  • Decent integration for visualisation
  • Fast to load data into the graph and capabilities for storing them too

Cons for Python libraries:

  • Slow to run algorithms
  • Takes some time to understand how the libraries work, but not as much time as Cypher

Some useful materials:

[*] This posed its own challenge later on, because that meant there were only 20 observations of a given label — not enough to train a model on. I still haven’t quite figured out a solution, and at any rate it’s beyond the scope of this article!