How a SQL Database Engine Works

Dennis Pham
6 min readDec 7, 2018

SQL (Structured Query Language) is a language used to communicate with relational databases. The objective of SQL is to manage large amounts of data, especially if there’s lots of data being written simultaneously and too many data transactions. SQL itself is a “spec” — a generalized language that supports the basics. The data management comes into play when the SQL client communicates with a database — and thus comes the ability to create/drop/collect/store data, update or delete that data, extract that data, and manage user permissions to that data. This system is called the relational database management system.

Different SQL servers will offer everything in that basic spec as well as new functions and features depending on the version and use cases needed such as MySQL, SQLite, Postgres SQL, Transact-SQL, etc. This includes Salesforce and Oracle both using their own SQL servers, IBM using IBM Db2, and Microsoft using Transact-SQL. There are many other businesses that will use some form of a SQL server for their relational database environment because of the efficiency, extensibility, scalability, and standards compliance. We’re going to do an overview of how SQL works.

Where does the data come from?

Let’s first take a broader look at how data gets to a SQL client. There are different methodologies for collecting and organizing data, but the fundamentals are the same. Data comes to a server and a data storage system such as from an Apache or Nginx platform can then process that data into tables and store it in a data warehouse server for SQL to use. This is usually done by converting the data to a format that the database can use (such as a JSON file) because we don’t want to write directly to the database itself. The database warehouse is the core database engine that allows a SQL client to connect and communicate with it. To retrieve data, the database warehouse forwards the SQL request to an application server. The application server then processes it and sends it back to a web server which turns it into presentable content to the user — SQL data tables.

The SQL Database Engine — Stages of Compiling

Let’s now take a closer look at how that database engine works. We now know that the data warehouse is the RDBMS that allows a SQL client to communicate with it. How does it process that data though?

SQL has its own stages for how a query compiles and executes. The taxonomy for these stages of processing in SQL varies depending on the client but it’s generally called a SQL driver or SQL statement. SQL Authority has a great diagram for this with the stages being compiling (parsing and binding), optimizing, and executing.

All SQL database engines have a compiler to translate the SQL statement into byte code and a virtual machine to evaluate the byte code. You can think of it similar to GCC in low level programming. Some SQL servers, such as SQLite, is actually written in the C Programming language with its own tokenizer, parser, interpreter, and B-tree headers and structures.

According to Microsoft, the RDBMS processes the SQL statement by:

1. Compiling (Parsing): Tokenizes the statement into individual words with valid verbiage and clauses.

2. Compiling (Checks semantics): Validates the statement by checking the SQL statement against the system’s catalog and seeing it these databases, tables, and columns that the user wants exist and if the user has privileges to execute the SQL query.

3. Compiling (Binding): Generates a query plan for the statement which is the binary representation of the steps required to carry out the statement. In almost all SQL server engines, it will be byte code. What has now been compiled is a command line shell — a program that reads SQL statements and now sends them to the database server for optimization and execution.

4. Optimizing: Optimizes the query plan and chooses the best algorithms such as for searching and sorting. This feature is called the Query Optimizer or Relational Engine. Once this is done, we now have a prepared SQL statement.

5. Executing: The RDBMS executes the SQL statement by running the query plan.

One thing to note is that there are more steps for compiling than optimizing and executing. This is because parsing/tokenizing a SQL statement does not require access to the database and can be done fairly quickly. But optimizing a query plan can be very CPU-intensive and for a complex query, the optimizer may explore many different ways of how to execute that query. This is why it’s much better to have the time spent in optimization as it’s a much lower opportunity cost rather than executing the query inefficiently. And it gets even more optimal if the same optimized query plan can still be used over and over again for a given similar schema.

https://docs.microsoft.com/en-us/sql/odbc/reference/processing-a-sql-statement?view=sql-server-2017

The SQL Database Engine — Storage Engine, Relational Engine, and Execution Engine

The Storage Engine and the Query Optimizer (Relational Engine) are essentially the two core components of a SQL Database Engine.

The SQL Storage Engine is a software module used to create, read, and update data between the disk and memory while still maintaining data integrity (rollback journals and write-ahead logs).

You can use this command to find out what storage engines are available:

mysql> SHOW ENGINES\G

A convenient feature of SQL is that all SQL clients have a SQL Query Optimizer. As discussed earlier, this optimizer chooses the algorithm to use depending on the query that was written. It’s similar to choosing an algorithm in programming languages based on time complexity, but instead the Query Optimizer chooses an algorithm by making an access plan / query plan as a cost-based optimizer. What this does is look at many different potential query access plans for a given query, estimates the cost of each of these plans and selects the one with the lowest cost out of all the choices. However, it doesn’t look at every single possible plan but instead does a cost-based balancing act that considers both the cost of finding potential plans and the cost of the plan themselves.

Once it chooses the best plan, it sends that query plan over to the Execution Engine to run the SQL statement.

Data -> Tables

SQL is usually written from a low level programming language such as C. Because of this, the SQL database engine can turn incoming data into tables of columns and records. It does this with B-trees — a type of binary search tree with several branches that keeps pointing to new data elements. These B-tree structures are what allows the data to be turned into a table of columns with keys being used as “pointers” to other keys.

https://www.red-gate.com/simple-talk/sql/sql-training/the-sql-server-query-optimizer/
https://www.youtube.com/watch?v=Z_cX3bzkExE

Conclusion

The SQL database engine is a crucial tool to have in almost any database environment. It is built with low-level programming languages and has its own stages for compiling and executing. It selects the best algorithms by comparing it to alternatives and creates tables by using binary trees. People can use different platforms all with different code and have them all connected to a SQL database engine. SQL allows for sharing and taking care of the data with a well-defined schema, automatic algorithm selection, automatic index maintenance, and fixing performance problems without have to recode and retest.

Sources

https://www.red-gate.com/simple-talk/sql/sql-training/the-sql-server-query-optimizer/

https://blog.sqlauthority.com/2013/10/21/big-data-data-mining-with-hive-what-is-hive-what-is-hiveql-hql-day-15-of-21/

https://www.sisense.com/blog/beginners-guide-to-mysql-storage-engines/

--

--