TiDB — SQL at Scale!

Ferdinand Jason Gondowijoyo
6 min readMar 8, 2020

--

TiDB (“Ti” stands for Titanium) is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

source: https://github.com/pingcap/tidb

PingCAP defines TiDB is as a “one-stop data warehouse for both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing) workloads”. TiDB is inspired by the design of Google F1 and Google Spanner, and it supports features like horizontal scalability, strong consistency, and high availability. TiDB developed with Go programming language. In this post, I have decided to explain about TiDB.

NewSQL

NewSQL is a new approach to relational databases that wants to combine transactional ACID (atomicity, consistency, isolation, durability) of RDBMS and the horizontal scalability of NoSQL.

The main problem with SQL/RDBMS is that it is very difficult to scale, as its performance quickly deteriorates as a database grows larger. Sharding is quite problematic as well. That’s why people sometimes moving to NoSQL. NoSQL can provide incredible read/write speeds but did not have ACID guarantees built into them, which prevented many systems from full adoption, as they require strong data integrity guarantees. NewSQL comes as the solution, these databases achieve an amazing feat of scalability by focusing their efforts on solving issues in one problem domain.

TiDB Architecture

The TiDB architecture consists of three key components: the TiDB server, the PD server, and the TiKV server. In addition, TiDB also provides a TiSpark component to support the needs of OLAP and TiDB Operators to make things simpler for management and deployment in the cloud. I’ll explain the three key components in TiDB.

TiDB Server
The TiDB server is stateless. It does not store data and it is for computing only. TiDB is horizontally scalable and provides the unified interface to the outside through the load balancing components such as Linux Virtual Server (LVS), HAProxy, or F5. The responsibilities of TiDB Server are shown as follows.

  1. Receiving the SQL requests
  2. Processing the SQL related logics
  3. Locating the TiKV address for storing and computing data through Placement Driver (PD)
  4. Exchanging data with TiKV
  5. Returning the result

PD Server (Placement Driver)
The PD server ensures redundancy by using the Raft consensus algorithm. The Raft leader is responsible for handling all operations, with remaining PD servers available for high availability only. It is highly recommended to deploy PD as an odd number of nodes (more than one as well so that the Raft consensus algorithm will able to run). The responsibilities of TiDB Server are shown as follows.

  1. Storing the metadata of the cluster such as the regional location of a specific key.
  2. Scheduling and load balancing regions in the TiKV cluster, including but not limited to data migration and Raft group leader transfer.
  3. Allocating the transaction ID that is globally unique and monotonically increasing.

TiKV Server
The TiKV server is responsible for storing data. From an external view, TiKV is a distributed transactional Key-Value storage engine and can be seen as a huge distributed ordered Map that is of high performance and reliability. The region is the basic unit to store data. Each Region stores the data for a particular Key Range which is a left-closed and right-open interval from StartKey to EndKey. There are multiple Regions in each TiKV node. TiKV uses the Raft protocol for replication to ensure data consistency and disaster recovery.

Difference with MySQL

TiDB is an open-source NewSQL database released under the Apache 2.0 License. Because it speaks the MySQL protocol, your existing applications will be able to connect to it using any MySQL connector, and most SQL functionality remains identical (joins, subqueries, transactions, etc.).

TiDB natively distributes query execution and storage
With MySQL, it is common to scale-out via replication. Typically you will have one MySQL master with many slaves, each with a complete copy of the data. Using either application logic or technology like ProxySQL, queries are routed to the appropriate server.

But TiDB works a little bit differently, query execution is handled via a layer of TiDB servers. Scaling out SQL processing is possible by adding new TiDB servers. The data for tables is automatically sharded into small chunks and distributed among TiKV servers.

TiDB’s storage engine is RocksDB
MySQL’s default storage engine has been InnoDB since 2010. Internally, InnoDB uses a B+tree data structure, which is similar to what traditional commercial databases use.

To compare, TiDB uses RocksDB as the storage engine with TiKV. RocksDB has advantages for large datasets because it can compress data more effectively and insert performance does not degrade when indexes can no longer fit in memory.

TiDB gathers metrics in Prometheus/Grafana
Tracking key metrics is an important part of maintaining database health. MySQL centralizes these fast-changing metrics in Performance Schema. Performance Schema is a set of in-memory tables that can be queried via regular SQL queries.

With TiDB, rather than retaining the metrics inside the server, a strategic choice was made to ship the information to a best-of-breed service. Prometheus+Grafana is a common technology stack among operations teams today, and the included graphs make it easy to create your own or configure thresholds for alarms.

TiDB handles DDL significantly better
TiDB implements online DDL using the protocol introduced by the Google F1 paper. In short, DDL changes are broken up into smaller transition stages so they can prevent data corruption scenarios, and the system tolerates an individual node being behind up to one DDL version at a time.

TiDB is designed for HTAP workloads
The MySQL team has traditionally focused its attention on optimizing performance for online transaction processing (OLTP) queries. TiDB is designed to perform well across hybrid transaction/analytical processing (HTAP) queries.

Hands-on Experience

After using TiDB for approximately one week with the following architecture.

Here’s what I got

  • MySQL Syntax Related
    - Cannot add/alter/drop PRIMARY KEY and FOREIGN KEY
    - Cannot use CREATE TEMPORARY TABLE syntax
    - The output format EXPLAIN is different
    For a complete list of TiDB compatibility with MySQL can be seen at the following link [here]
  • Performance Test by JMeter
    It was found that TiDB with the above architecture can handle 1000 requests to get some data with 3.6/sec throughput with an average response time of 6 ms
  • Performance Test by Sysbench
    The experiment was carried out to get one data (point select) with several PD servers (1 to 3) obtained the following data
    - 1 PD : 7741.62 transaction/sec
    - 2 PD : 8456.82 transaction/sec
    - 3 PD : 9235.65 transaction/sec
    So it can be concluded that increasing the number of PD servers can improve the performance of TiDB (scaling-up) but with insignificant numbers

For more information about TiDB, visit GitHub/official page here

--

--