How Do SQL Database Engines Work?

A Journey to understand the inner workings of SQL Database Engines

v-dav

Published in

Dev Genius

5 min readJul 27, 2023

Introduction

Are you familiar with writing SQL queries and statements, yet often wonder about the magic that happens behind the scenes? Are you curious about how your query gets transformed into the desired results?

If your answer is yes, then this article is meant for you. Through this short reading, you’ll discover the basics of how SQL Database Engines work, enabling you to write more efficient queries and troubleshoot issues with better understanding.

SQL is defined as a Structured Query Language that is a programming language used to manipulate data in databases. It allows us to interact with the database to create, retrieve, update, delete, and maintain the information stored there. The statements that are used to perform these operations are what we call queries. As Dr. Richard Hipp, creator of SQLite says each SQL statement is a separate program.

While simple data operations (Create, Read, Update, Delete — often abbreviated as CRUD) can be executed in most programming languages, SQL stands out when it comes to handling complex queries. A task that might require hundreds of lines of code in a different language can be accomplished with a few lines of SQL. Let’s get a closer look at the SQL Database Engine which makes this possible.

The SQL Database Engine

At its core, an SQL Database Engine is a piece of software that understands SQL commands, interprets them, and allows relevant operations to be performed on the relational database. The engine comprises two main components:

A Compiler: Translates SQL queries into executable procedures.
A Virtual Machine: Runs the procedures and returns the results.

The compiler transforms our SQL statement, which is like a mini-program, into procedures or a form of byte code. The Virtual Machine then evaluates and runs this byte code, ultimately returning the required results to the client. The process can be likened to how GCC works in low-level programming, and indeed, some SQL servers are even built using the C programming language.

The entire journey of a query from input to output can be divided into five main steps:

Compilation Phase

Parsing: Every SQL query starts with a parsing phase. The parser reads the query from left to right and breaks it down into “tokens”. These tokens are individual words or elements of the SQL language, like SELECT, FROM, WHERE, and so on. At this point, the parser checks whether the tokens are in an order that makes sense according to the rules of SQL grammar. If the tokens don’t make sense together, an error is thrown and the query execution stops. If the tokens are in an order that is valid according to SQL grammar, the parsing step is successful and we move on to the next step.
Semantic Checking: The query is validated to ensure the syntax is correct. The semantics of the statement is also verified by comparing it against the system’s catalog to confirm the existence of the databases, tables, and columns referenced in the query. Additionally, the user’s permissions to execute the statement are checked.

-- This statement fails because FORM isn't a valid SQL keyword.
mysql> SELECT * FORM a_table;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FORM a_table' at line 1

-- This statement is syntactically correct but fails the semantic check.
mysql> SELECT * FROM a_table;
ERROR 1146 (42S02): Table 'a_table' doesn't exist

3. Binding: After the SQL engine has ensured the query is valid, it needs to convert the high-level SQL query into a format that it can work with more easily. At this stage, a binary representation of the steps required to process the statement is generated, resulting in the creation of the byte code. This process is known as binding or query rewriting. During this phase, the database engine translates table and column names into their corresponding IDs in the system catalog. It also replaces any aliases used in the query with the actual table or column names they represent.

Moreover, this is the stage where the SQL engine will apply any applicable rules or rewrite the query to optimize it further. For example, if the query contains a view (a saved query), the engine will replace the view name in the query with the actual query that defines the view.

After this, the query moves on to the optimization and execution phases as described previously.

4. Optimization: Once the SQL query has been verified to be both syntactically and semantically correct, it goes through the optimization phase. The SQL engine will try to find the most efficient way to execute the query. It does this by examining the available data statistics and indexes, and generating various potential “query plans”. These plans are different ways the database could go about finding the requested data. The optimizer estimates the cost of each of these plans, in terms of expected memory, CPU usage, and I/O operations, and selects the one with the least cost.

Execution Phase

Once the optimal query plan has been selected, the SQL engine moves on to the execution phase. The Virtual Machine comes into play in this phase, where it executes the byte code and retrieves the data as per the instructions from the compiled query. This is where the actual work of finding the data you asked for happens. The engine follows the steps outlined in the query plan, reading, sorting, or modifying data as necessary. When the engine has finished executing the plan, it returns the results of the query to you, and the process is complete.

These stages demonstrate the steps that SQL engines take to transform the SQL queries you write into the actions required to read or modify the data stored in the database. Through these processes, the SQL database engine makes handling and manipulating data efficient and user-friendly.

https://learn.microsoft.com/en-us/sql/odbc/reference/processing-a-sql-statement?view=sql-server-2017

Summary

The SQL Database Engine stands as an indispensable asset in virtually every database ecosystem. Crafted using low-level coding languages, it has distinctive phases for both compiling and executing processes.

The engine identifies the optimal algorithms by juxtaposing them against other options and constructs tables leveraging binary trees. Users can employ a variety of platforms, each with its own unique code, all while maintaining connectivity with the SQL database engine.

SQL facilitates data sharing and management with a clearly defined schema, automatic algorithm selection, and maintenance of indexes. It also mitigates performance issues without necessitating a redo of coding and testing.

Through the magic of parsing, semantic checking, binding, and optimization, our SQL queries are transformed into byte code and executed, enabling us to manage vast amounts of data with just a few lines of SQL. Here’s to the silent engine that makes it all possible!