SQL Server Architecture Explained

Ravindra Raj Kanitkar
Canadiv’s Technology and Design
7 min readMay 22, 2023

MySQL is the world’s most popular open-source database management system which includes numerous features developed in close cooperation with users over more than 25 years.

We all know that databases are used to store and retrieve data. However, this is a completely abstract statement because what we see is an interface of MySQL Workbench or just a terminal, but many more things are going behind the scenes to execute a query. But don’t worry; I strive to simplify the process so that everyone can grasp it.

Firstly, when a query is executed in a terminal, we get results within a fraction of a second but that query is processed through various building blocks of SQL server backend architecture. Let’s have a look into these processes but first please go through the following process diagram.

Step 1: Write

When we write and run a query, it is handed to the PROTOCOL LAYER, which is in charge of establishing and maintaining a client-server connection. A client-server connection can be established in three ways: shared memory, TCP/IP, and named pipes.

Shared Memory:- Assume you’re in a library and want to borrow a book from the librarian. This is a fantastic example of the shared memory concept. In this scenario, you are the client, the librarian is the server, the library is your machine, and the request protocol is vocal communication.

TCP/IP:- Assume you wish to buy groceries from a nearby supermarket that offers online delivery. You called the vendor and requested some fruits, paid for them, and received the delivery. You are the client in this situation, and the vendor is the SQL server. Your remote location is your home/marketplace, and your TCP/IP protocol is the cellular network.

Named Pipe:- Assume you contacted your neighbor Rashmi and asked for a cup of sugar (like we all do 😜). Rashmi hands you a cup of sugar and requests a cup of coffee. In this scenario, you are the client and Rashmi is the server, and the intranet communication is known as pipe protocol.

TDS Concept: TDS stands for a tabular data stream. All 3 protocols use TDS packets. It is encapsulated in network packets. This enables data transfer from a client machine to a server machine. It was developed by Sybase and is now owned by Microsoft.

Step 2: Optimization

After maintaining a client-server connection, it is handled by the Relational Engine which is responsible for optimizing the query and preparing a plan to execute the query. There are 3 major components of a Relational Engine. CMD Parser, Optimizer, and Query Executor.

CMD Parser:- Data Once received, “CMD Parser” is the first component to find syntactic and semantic errors in the query. Finally, it generates a Query Tree.

Syntactical check:- It finds out all the syntax errors in our raw query. Various programming languages have different keywords, and rules which needed to be followed. Similarly, SQL also has predefined keywords, and when a user writes SELECTX * FROM “users” instead of SELECT * FROM “users”. It will throw and syntactic error as SELECTX is not a keyword.

Semantic check:- Semantic errors are logical errors. For instance, our database has a single user table that contains the id, name, and age of each user. However, the query SELECT address from “users” is executed by the user. The address column is absent from user tables, hence it will produce semantic issues even though there are no syntactic ones.

After ensuring that there are no grammatical or semantic problems, the CMD parser generates several query trees. Please be aware that the required output is the same for every query tree.

Optimizer:- The work of the optimizer is to create an execution plan for the user’s query. The plan will determine how the user query will be executed. There are some points to be noted:

  1. Optimization is done only for DDL (SELECT) and DML(INSERT,…) commands.
  2. The job of the optimizer is to discover the optimal plan, not the cheapest. For example, you might want to purchase a book that is being offered at a local bookstore for Rs 1000 and on Amazon for Rs 700. You now require it quickly and affordably. You decided to buy that book from Amazon. A book that you would have received in only one day if you had purchased it from a local retailer took five days to arrive. Therefore we want the best or optimal plan to execute a query.
  3. All the optimizers work on inbuilt heuristic algorithms. The main goal of all the algorithms is to minimize query run time. It has 3 phases

Phase 0:- Search for a trivial plan, which is the most basic. It’s a more practical or workable plan. If not found, phase 1 starts.

Phase 1:- Search for a transaction processing plan. The search is done for simple and complex plans. The simple plan refers to an analysis of past data involved in queries.

Phase 2:- In this phase, parallel processing and optimization are done. More complex algorithms are executed which are completely a Microsoft propriety.

Query Executor:- Query executor calls access methods. It provides an execution plan for data fetching logic which is required for execution.

Step 3: Identify

The query plan is transferred to an Access methods block which is the part of Storage Engine. but before that let’s discuss how SQL stores data in memory. The work of the Storage Engine is to store data in a storage system like Disk or SAN and retrieve the data when needed. Before we deep dive into the Storage engine, let’s have a look at how data is stored in the Database.

An extent of 64 KB in size is eight physically contiguous pages of size 8 KB each. This means that SQL Server databases have 16 extents per megabyte. Each page has a section called page header of size 96 bytes containing information about the page like page type, page no, size of used space, a pointer to the next page, etc.

Step 4: Execute

Let’s now talk about the access methods

Access Methods:- It acts as an interface between the executor and buffer manager/transaction manager. Certain points will help you understand access methods better

  1. Access methods don’t execute anything.
  2. It detects the query type whether it is DDL (select) command or all DDL and DML commands.
  3. When the SELECT command is detected it is passed to a buffer manager.
  4. When the NON-SELECT command is detected it is passed to the transaction manager.

Buffer Manager:- It manages core functions

  1. Plan Cache:- It checks if the execution plan is already existing in plan cache memory. If yes, that query plan and its associated data are used.
  2. Data Parsing:- It looks for data in the buffer cache. If present, that data is used by the query executor. This saves time and improves performance. This is also known as soft parsing. But the first-time data plan needs to be stored.
  3. When the data that we are looking for is not present in the cache. We need to bring the data from the main memory and then store it in the buffer cache for further use. This process consumes more time and is called hard parsing.

Transaction Manager:- This block is invoked when the access method determines a non-select query.

Log manager:- It keeps track of all updates done in the system via logs present in transaction log memory. It has a sequence number with transaction ID and data modification record. It is used to keep track of transactions committed and transaction rollback.

  1. Lock Manager:- Associated data in data storage is in a lock state. This process is handled by the lock manager. Data consistency and isolation, also known as ACID properties are followed by this block.

Execution Process

  1. The log manager starts logging & lock manager looks for the associated data.
  2. Data’s copy is maintained in the buffer cache and the copy of data is supposed to be updated. Pages that store data are known as dirty pages.
  3. Write ahead logging:- Run and mark all pages from dirty page to disk, page remains in cache. Frequency is approx 1 run per minute but the page is first pushed to the data page of the log file of the buffer log.
  4. Lazy writer:- Dirty pages can remain in memory. When the SQL server observes a huge load and buffer memory is needed for a new transaction, it frees up dirty pages from the cache with the LRU algorithm.

Step 5: Final Result

After fetching results, the result data is passed through the protocol layer to our interface from where we are calling the query.

Follow and stay connected with Canadiv Publication to stay informed about valuable insights and the most current trends within the development ecosystem.
Follow Us:
LinkedIn | Instagram

--

--

Ravindra Raj Kanitkar
Canadiv’s Technology and Design

Hi there! I'm Ravindra Kanitkar, a backend developer at Canadiv and a Youtuber. I'm passionate about coding and creating online content.