Recursion in SQL Explained Visually

Published in

The Startup

5 min readNov 22, 2020

--

Recursion in SQL? But why? Oh, there are many uses for that. It’s common to store hierarchical data in SQL and recursive queries are a convenient way to extract information from such graphs. Organizational structure, application menu structure, a set of tasks with sub-tasks in the project, links between web pages, breakdown of an equipment module into parts and sub-parts are examples of the hierarchical data. The post will not go into great details of those many use cases rather look at two toy examples to understand the concept - the simplest possible case of recursion on numbers and querying data from the family tree.

Let's think about queries as a function. In a sense that a function takes an input and produces an output. Queries operate on relations or one could say tables. We will denote those as Rn. Here is a picture of a query. It takes three relations R1, R2, R3 and produces an output R. Simple enough.

Caption: A picture representation of how a query works.

SQL example: SELECT <something> FROM R1, R2, R3 WHERE <condition>

Query can take something and produce nothing:

A visual representation of a query taking something and producing nothing.

SQL example: SELECT <something> FROM R1 WHERE 1 = 2

Take nothing and produce something:

A visual representation of a query taking nothing and producing something.

SQL example: SELECT 1

Or take nothing and produce nothing

A visual representation of a query taking nothing and producing nothing.

SELECT 1 WHERE 1 = 2

Recursion is achieved by WITH statement, in SQL jargon called Common Table Expression (CTE). It allows to name the result and reference it within other queries sometime later.

Naming the result and referencing it within other queries.

Here is a sample.

WITH R AS (SELECT 1 AS n)
SELECT n + 1 FROM R

Query (SELECT 1 AS n) now have a name — R. We refer to that name in SELECT n + 1 FROM R. Here R is a single row, single column table containing number 1. The result of the whole expression is number 2.

The recursive version of WITH statement references to itself while computing output.

Using the recursive with statement.

For the recursion to work we need to start with something and decide when the recursion should stop. To achieve this, usually recursive with statement has following form.

Important to note that base query doesn’t involve R, but recursive query references R. From the first look it seems like infinite loop, to compute R we need compute R. But here is a catch. R actually don’t reference itself, it just references previous result and when previous result is empty table, recursion stops. Actually it could help to think of it as an iteration rather then recursion!

Here’s what is happening: base query executed first, taking whatever it needs to compute the result R0. Second recursive query is executed taking R0 as input, that is R references R0 in the recursive query when first executed. Recursive query produces the result R1 and that is what R will reference to at the next invocation. And so on until recursive query returns empty result. At that point all intermediate results are combined together.

Recursive query execution algorithm flow chart

Quite abstract now. Lets take a concrete example, count until 3.

Running a recursive statement with count until three.

Base query returns number 1 , recursive query takes it under the countUp name and produces number 2, which is the input for the next recursive call. When recursive query returns empty table (n >= 3), the results from the calls are stacked together.

Illustration of the results from the call stacked together.

Watch out, counting up like that can only go that far. There is a limit for recursion. It defaults to 100, but could be extended with MAXRECURSION option (MS SQL Server specific). Practically, it could be a bad idea to crank recursion limit up. Graphs might have cycles and limited recursion depth can be a good defense mechanism to stop poorly behaving query.

OPTION (MAXRECURSION 200)

Here’s another example, find ancestors of a person:

Using recursion to find the ancestors of a person.

Code statement to use recursion to find the ancestors of a person.

Base query finds Frank’s parent — Mary, recursive query takes this result under the Ancestor name and finds parents of Mary, which are Dave and Eve and this continues until we can’t find any parents anymore.

Illustration of the results from the recursion to find the ancestors of a person.

A table that includes the birth year to find the parents of a person.

Now this tree traversal query could be the basis to augment the query with some other information of interest. For example, having a birth year in the table we can calculate how old the parent was when the child was born. Next query do exactly that, together with showing lineages. To do that it traverses the tree from top to bottom. parentAge is zero in the first row because we don’t know when Alice was born from the data we have.

Running a recursion to find the birth year of a person and their ancestors.

Table representing the results from the recursion to find the birth year and ancestors of a person.

Take away — recursive query references the result of base query or previous invocation of recursive query. Chain stops when recursive query returns empty table.

In the follow-up post we’ll take an algebraic view on SQL recursion and will look into recursive stored procedures.

[UPDATE] Post updated with comments from kagato87 and GuybrushFourpwood reddit users.

[NOTE] Code samples are for MS-SQL. Other DBMS could have slightly different syntax.

Recursion in SQL Explained Visually

Written by Denis Lukichev