SQL Recursive Queries
Suppose a scenario, where one of your friends is asking about the names of your parents and that you don’t know the answer. In that case, you will have to forward the question to your parents, get their names, and pass the answer back to your friend. On a different day, if your friend wants to know the names of your grandparents and your parents, you will have to do the same, except in this scenario, your parents will have to pass the question to their immediate parents and pass the answer to you so you pass it to your friend. There your friend will receive a bunch of answers collected from your grandparents and parents. This is the same way, MySQL Recursive queries work. Welcome to the world of Recursion!
Assume you are given one SQL database table that represents a family tree. If you are familiar with SQL, selecting all the immediate children of any given parent costs a simple SELECT statement. But when writing SQL queries, have you ever experienced the need for retrieving all the children under a particular parent? In other words, given a parent id, obtaining a list of hierarchical records related to the id?
If you are familiar with the concept of the relational databases, storing hierarchical or tree-structured data in MySQL databases should not be a miracle. But retrieving records related to a subset of that tree would have brought you pain. This is similar to your friend inquiring about the names of all your ancestors for a couple of generations. MySQL 8 wipes out this pain by introducing a new type of query named as Recursive Queries.
To understand the concept of recursive queries, one must realize what is meant by Common Table Expressions(CTE). A common table expression is a named temporary result-set that exists within the execution scope of a single SQL statement e.g.,SELECT
, INSERT
, UPDATE
or DELETE
.CTEs are somewhat similar to derived tables in the sense, both of them are not stored as objects and also get destroyed after the query execution. However, CTEs provide more readability and performance advantages as they can be referenced multiple times in the same query including self-referencing. Below given is the basic format of a CTE.
Having a basic understanding of what a CTE is, let’s move on to Recursive CTEs. A recursive common table expression (CTE) is a CTE that contains a subquery which refers to the cte_name itself. The following illustrates the syntax of a recursive CTE.
Recursive query execution has 3 major steps. First, the initial_query (anchor member) will be executed. This is responsible for laying the foundation for recursion. Once the initial query has fetched its results, the recursive steps start their execution. The recursive_query refers to the result fetched by the initial_query and gets the required properties it needs to execute the first iteration of recursion. Upon receiving the results of the first recursive step, it moves to the next iteration, using the results of the previous iteration. Let’s clarify the idea a bit with a practical example.
In the first image, the Royal Family tree is illustrated. For simplicity, only a part of the family tree(included names are shown inside squares) is entered into the ‘MEMBER’ table in the ‘ROYAL_FAMILY’ database.
Let’s create a database table(MEMBER) that contains a few hierarchical records. Below represents records with the parent-child relationship related to the Royal Family. The relationship between two members will be represented using the PARENT_ID attribute, which is a self-reference foreign key to the ID attribute.
First, we shall retrieve all the descendants of Queen Elizabeth(including herself) as per the database table. The following recursive query would do the job! Let the name of the recursive query be top_down_cte.
Now it’s time to explore the structure of our recursive query.
The query part tagged as initial_query fetches the initial results including the ID, NAME, PARENT_ID of the member whose name is ‘ELIZABETH’. How can we retrieve the children, without knowing whose children we are about to fetch? At this stage, our result set from initial_query would be similar to the following.
Now, having the foundation for our recursion, we are ready to fetch the immediate children of ‘ELIZABETH’. The query part tagged as recursive_query fetches the next immediate children based on the previous result set. In our scenario, the recursive query uses an Inner Join to refer to the previous result set which is the top_down_cte itself. Here comes the bridge to the previous results. What properties or columns from the previous result set, are used to fetch the next result set? It depends on the requirement. In this specific scenario, I need to fetch the next result set based on the previous result’s ID column. Simply I request to fetch any row from the MEMBER table (having the alias, ‘m’) whose PARENT_ID matches with the ID of the top_down_cte(previous result set). What happens if I cannot find any row, according to the mentioned condition? Simple! The recursion would stop:) Our recursive query iteration would stop when it figures out that there is no record whose PARENT_ID equals either one of 5,6,7 or 8. The results fetched from the recursive_query part would be similar to the following.
In the middle of the initial_query and the recursive_query, exists the keyword UNION. This is responsible for combining the results fetched from the initial_query and the recursive_query. Now, the combined result set would be similar to the following.
Having the full top_down_cte(name of the recursive query), in hand, we can select either all the records or any particular record as per the requirement. In this scenario, we have selected all the records from the top_down_cte.
In the previous exercise, we have fetched records related to a top-down hierarchy. What if we need to fetch records from bottom-to-top? In other words, what if we need to fetch details on all the ancestors of Prince GEORGE(including himself)? We can fetch the records by doing a simple manipulation to the above recursive query:). Let the name of the recursive query be bottom_up_cte.
- Change the base query to first get the information on Prince George.
- Change the condition of the recursion. We now have to fetch the next record, based on the bottom_up_cte’s (previous record’s) PARENT_ID column. Now we are querying, whose ID in the MEMBER table is similar to the PARENT_ID of a record from the bottom_up_cte.
The following recursive query would fetch the required records from the MEMBER table.
Being able to build this recursion pattern from top-to-bottom as well as from bottom-to-up inside a hierarchy, one might question the possibility of traversing both the ways within one query. YES! That’s also possible. It’s worthful to give a try on this last recursive query on your own. This is the miracle of MySQL 8:)
References