Distributed SQL Database Internals (2) — Design a Distributed SQL Database

Li Shen
4 min readOct 12, 2023

--

This is the second article of this series blog. You can find the previous ones here:

Foreword

Designing a distributed SQL database is not easy work. There are a lot of details to be taken care of. In this series, we are not going to boil the whole ocean, like covering all the picese and and talking about all the possible solutions. We will focus on a few key considerations and components.

Understand your target workload

The first thing that should come to your mind is the targeted workload of your database. Around that are the design goals which are the essential properties to meet the requirements of the targeted workload.

What does TiDB’s targeted workload look like? As a distributed SQL database, TiDB is designed for modern applications and supports business growth. We have two observations for the modern applications.

First, the world is more transactional. In today’s dynamic digital landscape, transactions have become notably more granular. For instance, contemporary Software as a Service (SaaS) applications typically adopt a Pay-as-You-Go pricing model, which significantly increases the frequency and specificity of transactions. In contrast to traditional pricing models, which might only process a singular, annual transaction, the Pay-as-You-Go approach necessitates handling a myriad of fine-grained transactions. This reflects a broader shift in global transaction patterns, favoring more frequent, detailed transactional interactions to accommodate the evolving demands of modern consumers and businesses alike.

Second, in today’s business landscape, data has become the lifeblood of organizations, driving informed decision-making and fueling innovation. Businesses are increasingly data-driven, recognizing that the insights derived from data are instrumental in gaining a competitive edge. Moreover, data itself has transitioned from being a byproduct to a valuable asset, with companies monetizing it by offering data-as-a-service (DaaS). DaaS not only provides scalability and cost-efficiency but also allows businesses to focus on their core competencies while outsourcing the complexities of data management. This approach opens up new avenues for revenue generation through data monetization, creating a dynamic ecosystem where data is both a means and an end in the business world.

We realized that modern applications need a better storage solution. Which can provide the essential features, as built-in ones instead of working around, to accelerate business innovation.

Design Goals of TiDB

Based on the understanding of the targeted workload, TiDB has the following design goals:

Horizontal Scalability

Enabling a database to handle growing data and transaction volumes by adding more nodes to the system, ensuring that it can scale out to meet demand without sacrificing performance.

High Availability and Reliability

Ensuring that the database remains operational and accessible even in the face of node failures or network issues, often through mechanisms like data replication and automatic failover.

Strong Consistency

Maintaining data accuracy across all nodes in the distributed system, ensuring that every read receives the most recent write and adhering to ACID properties even in a distributed environment.

Operational Simplicity

Facilitating ease of management despite the inherent complexity of distributed systems, providing tools and features that abstract complexity and make tasks like setup, scaling, and recovery straightforward and efficient.

These design goals aim to provide a robust, reliable, and operationally straightforward database that can scale to meet the needs of large, demanding, and critical applications.

High-Level Architecture and Considerations

There are various architecture designs for databases. But at a high level, most of them would be like the following diagram. From the top to the bottom, these parts are the key considerations:

  • Transport Layer / Wire Protocol
  • Query Planner
  • Query Execution Engine
  • Transaction Model
  • Data storage structures

For a distributed database, there are more things:

  • Data replication and consensus model
  • Data placement / workload balancing module
  • Distributed execution engine

We will not deep into details about what are those concepts. There are lots of good materials to learn about that. We also will not elaborate on all the possible options for each consideration but focus more on the path that TiDB goes with. It is easy to design your own system by replacing some of the design decisions according to your own target workload and experience. There is not a single best choice for everyone or every system.

Here is the architecture diagram of TiDB:

TiDB Architecture

In the following chapters, we will go through the details of those key components/modules in the order from bottom to top. Because the storage layer is the foundation of the query processing layer. The design of storage will affect the design of the query processor.

--

--

Li Shen

Author of TiDB, Focus on Modern Infrastructure Software, Opinions are my own