Recursion in SQL Explained Visually

Denis Lukichev
Nov 22, 2020 · 5 min read

Recursion in SQL? But why? Oh, there are many uses for that. It’s common to store hierarchical data in SQL and recursive queries are a convenient way to extract information from such graphs. Organizational structure, application menu structure, a set of tasks with sub-tasks in the project, links between web pages, breakdown of an equipment module into parts and sub-parts are examples of the hierarchical data. The post will not go into great details of those many use cases rather look at two toy examples to understand the concept - the simplest possible case of recursion on numbers and querying data from the family tree.

Let's think about queries as a function. In a sense that a function takes an input and produces an output. Queries operate on relations or one could say tables. We will denote those as Rn. Here is a picture of a query. It takes three relations R1, R2, R3 and produces an output R. Simple enough.

Image for post
Image for post

SQL example: SELECT <something> FROM R1, R2, R3 WHERE <condition>

Query can take something and produce nothing:

Image for post
Image for post

SQL example: SELECT <something> FROM R1 WHERE 1 = 2

Take nothing and produce something:

Image for post
Image for post

SQL example: SELECT 1

Or take nothing and produce nothing

Image for post
Image for post

SELECT 1 WHERE 1 = 2

Recursion is achieved by WITH statement, in SQL jargon called Common Table Expression (CTE). It allows to name the result and reference it within other queries sometime later.

Image for post
Image for post

Here is a sample.

WITH R AS (SELECT 1 AS n)
SELECT n + 1 FROM R

Query (SELECT 1 AS n) now have a name — R. We refer to that name in SELECT n + 1 FROM R. Here R is a single row, single column table containing number 1. The result of the whole expression is number 2.

The recursive version of WITH statement references to itself while computing output.

Image for post
Image for post

For the recursion to work we need to start with something and decide when the recursion should stop. To achieve this, usually recursive with statement has following form.

Image for post
Image for post

Important to note that base query doesn’t involve R, but recursive query references R. From the first look it seems like infinite loop, to compute R we need compute R. But here is a catch. R actually don’t reference itself, it just references previous result and when previous result is empty table, recursion stops. Actually it could help to thing of it as an iteration rather then recursion!

Here’s what is happening: base query executed first, taking whatever it needs to compute the result R0. Second recursive query is executed taking R0 as input, that is R references R0 in the recursive query when first executed. Recursive query produces the result R1 and that is what R will reference to at the next invocation. And so on until recursive query returns empty result. At that point all intermediate results are combined together.

Recursive query execution algorithm flow chart
Image for post
Image for post
Recursive query execution sequence

Quite abstract now. Lets take a concrete example, count until 3.

Image for post
Image for post

Base query returns number 1 , recursive query takes it under the countUp name and produces number 2, which is the input for the next recursive call. When recursive query returns empty table (n >= 3), the results from the calls are stacked together.

Image for post
Image for post

Watch out, counting up like that can only go that far. There is a limit for recursion. It defaults to 100, but could be extended with MAXRECURSION option (MS SQL Server specific). Practically, it could be a bad idea to crank recursion limit up. Graphs might have cycles and limited recursion depth can be a good defense mechanism to stop poorly behaving query.

OPTION (MAXRECURSION 200)

Here’s another example, find ancestors of a person:

Image for post
Image for post
Image for post
Image for post

Base query finds Frank’s parent — Mary, recursive query takes this result under the Ancestor name and finds parents of Mary, which are Dave and Eve and this continues until we can’t find any parents anymore.

Image for post
Image for post
Image for post
Image for post

Now this tree traversal query could be the basis to augment the query with some other information of interest. For example, having a birth year in the table we can calculate how old the parent was when the child was born. Next query do exactly that, together with showing lineages. To do that it traverses the tree from top to bottom. parentAge is zero in the first row because we don’t know when Alice was born from the data we have.

Image for post
Image for post
Image for post
Image for post

Take away — recursive query references the result of base query or previous invocation of recursive query. Chain stops when recursive query returns empty table.

In the follow-up post we’ll take an algebraic view on SQL recursion and will look into recursive stored procedures.

[UPDATE]. Post updated with comments from kagato87 and GuybrushFourpwood reddit users.

The Startup

Medium's largest active publication, followed by +752K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store