Explore your SQL Query Execution

Ishara Madhavi
Analytics Vidhya
Published in
6 min readNov 29, 2019

--

Photo by Simon Migaj on Unsplash

With a basic understanding retrieved from the previous article written on Recursive CTEs, it’s time to explore the world of execution behind these queries. Although the idea of Recursive queries is relatively new, the execution plan has played a major role in optimizing query performances by identifying possible pitfalls in them.

A basic understanding of how the optimization takes place would be valuable. MySQL uses a Client-server protocol. The steps of the protocol are listed below briefly.

  1. Once the client hits execute of a query, first the server analyses syntax of the query. The syntax analysis verifies whether the language components are formed correctly to construct a valid statement.
  2. Next, the meaning of the syntactically legal query is evaluated, which is known as semantic analysis. At this stage, all references database objects and host variables will be checked for accurate usage.
  3. Once the written SQL query has passed both checks, it will be granted a SQL ID and (MD5) hash value by the server. The hash value is based on the few hundred characters of your statement, so hash-collisions can occur, especially for long statements.
  4. The next stage will be parsing the query into a parse tree. Having this goal in mind, the query is exposed to a shared memory area(shared pool) which contains parse trees and execution plans for the SQL queries that we have executed. Each unique SQL query would possess a unique shared SQL area. The parsing can happen in 2 major ways afterward.

I. Soft Parse

If a matching hash value is found from the shared pools for the hash value generated for the SQL query, the same parse tree and the execution plan would be used.

II. Hard Parse

If no matching hash value is found, the query is handed over to the Query Optimizer to select the best execution plan and the parse tree. MySQL uses a cost-based optimizer, which means it tries to predict the cost of various execution plans and selects the least expensive.

The execution plan

MySQL does not generate byte-code to execute a query, as many other database products do. Instead, the query execution plan acts as a tree of instructions that the query execution engine has to follow in order to produce the query results. One of the most important optimizations in the execution plan is the join optimization. The following conceptual figure illustrates how a multi-table query is formed in our minds in the form of joins of 4 tables.

However, the MySQL query execution plan always does this job in the form of a left-deep tree, like follows.

5. The next stage is the Query Execution stage. In contrast to the optimization stage, the execution stage is usually not all that complex: MySQL simply follows the instructions given in the query execution plan.

6. The final stage of query execution is, sending the result set to the client, event at the presence of a null result set.

The following diagram illustrates the execution stages of a query in a nutshell.

Having this brief concept in mind, let’s clarify the idea of the query execution plan with an example from the previous post on Recursive queries.

We have created a sample table of the Royal Family members with the parent-child relationship and sharpened our skills in writing a simple recursive query. Now we can examine how the server has created its execution plan by executing our query from the SQL editor and then selecting the Execution Plan tab in the query results tab in MySQL Workbench. The default plan is displayed in the Visual format and we can change that to Tabular too. Or if we just need the tabular explanation, use the EXPLAIN keyword at the very beginning of any query. The recursive query below will output the descendants of Queen Elizabeth(including herself) as per the database table.

The execution plan for the query is as bellows.

Query Execution Plan from MySQL Workbench

Things to have a quick glance at:

  • This execution plan as well takes the form of a left-deep tree.
  • Each query_block# represents a SELECT statement out of the query.

query_block#1 : outermost SELECT statement

query_block#2 : one input to UNION operation (anchor member)

query_block#3 : next input to UNION operation. (Recursive member)

  • The bottom-up execution tree gradually constructs the final result set.
  • Number in the top right of a box: number of rows expected to be used from the table after filtering
  • Number in the top left of a box: relative cost of accessing that table
  • The query_block#2 which represents the anchor part of the query is estimated to be the result of a full table scan. Even if we are to execute the query, we will have to go through the full table, in order to find out the row which includes the NAME as ‘ELIZABETH’. Thus, we can observe that the server has made the right choice. The block is represented as a red box, which indicates that the cost of operation is relatively high. The text immediately below the block is the table/alias used.
  • The left side diamond operator represents a nested loop, in other terms, a join on tables. The name nested loop suggests a nested for loop. The reason is, in performing a join operation, we take a record from one table and retrieve all matching records from the next table in the same way we perform a nested for loop. One input to the nested loop here is the top_down_cte(recursive query itself) records represented in a red box. Matching records from the MEMBER table is the other input. Reading all the records of the top_down_cte is required in order to fetch the matching results from the MEMBER table. Thus, the top_down_cte is also marked as a Full Table Scan in the query execution plan, inside a red box. The other input to the nested loop, in the green box, uses a Non-Unique Key Lookup and indicates that the cost is Low-Medium. If the number of rows is high, the cost of operation can be high. The highlighted text line gives the name of the foreign key constraint used for the join operation or any index used. The PARENT_ID column is not made UNIQUE, but it references the ID column which is the Primary Key of the very same table via the foreign key constraint. Due to this reason, given an ID from the top_down_cte, it is not a severe operation to find whose PARENT_ID is similar to the given ID from the MEMBERS table. Thus the block is presented in Green.

This information set collectively gives possible ways of improving our queries. Having seen the query_block#2 is red and relatively contains a costly operation, maybe we can try to index the NAME column in the table so that it would not fully scan the table. Instead, it may use the index on the NAME column to filter out row/s whose name is ‘ELIZABETH’. And the following execution plan proves this fact by indicating the query_block#2 in green as we have already expected. Plus, the query cost of the block#2 has been reduced by one third.

Query Execution Plan from MySQL Workbench, after indexing NAME column in MEMBER table

I hope you enjoyed learning the real world behind the SQL execution. The take away from this reading would be, never to implement a query without optimizing it!

References

--

--

Ishara Madhavi
Analytics Vidhya

Software Engineer at Sysco LABS | Graduate -Computer Science and Engineering -University of Moratuwa