NewSQL — The Next Evolution in Databases

Some Similarities and Differences Between TiDB, CockroachDB, FaunaDB and Vitess

Gokul Prabagaren
Capital One Tech
5 min readOct 30, 2019

--

You may be wondering what is this new buzzword NewSQL? To understand it, we have to briefly go through the history of database evolution.

SQL and RDBMS Era

In the early 70’s, IBM introduced the SQL (Structured Query Language) for manipulating data.This idea took off and a lot of companies adopted SQL; introducing their own implementation of RDBMS (Relational Database Management Systems). This included Oracle Corporation’s Oracle, IBM Corporation’s Informix, DB2, and the open source MySQL.

In a RDBMS, data is stored in a structured table format of rows and columns with the purpose of satisfying ACID (Atomicity, Consistency, Isolation, Durability) properties. RDBMS/SQL were able to achieve ACID properties because the data stayed in one big database server, meaning data consistency was not an issue in these cases. But with the explosion of the Internet in the 90’s, the amount of data being generated grew exponentially. This proliferation of data led to a new paradigm — NoSQL

NoSQL Era

NoSQL databases were designed around the concept of scaling data in a distributed environment. In NoSQL databases there is always a copy of the data available in a different remote machine. These databases were designed using CAP Theorem as the backbone. This theorem states that between Consistency, Availability, and Partition Tolerance, only two of the three aspects can be achieved at any given time. These databases leverage eventual consistency guaranteeing read operation receives recent write.

The evolution of data and maturation of distributed systems led NoSQL databases to slowly penetrate into the RDBMS market. MongoDB, Cassandra, and Redis are all popular NoSQL databases you may be familiar with.

Introducing NewSQL

While various variants of the NoSQL database continue to be used, there is another paradigm arising in parallel to NoSQL — NewSQL. NewSQL promises to combine benefits from RDBMS (strong consistency) with benefits from NoSQL (scalability); it mainly achieves this through new architecture patterns and efficient SQL storage engines.

Most current NewSQL databases are based on Google’s Spanner database and the theories in academic papers such as Calvin: Fast Distributed Transactions for Partitioned Database Systems from Yale. Spanner is Google’s scalable, multi-version, globally-distributed, and synchronously-replicated database. It was the first system to distribute data at global scale and support externally-consistent distributed transactions. The Calvin academic paper from Yale was on leveraging determinism to guarantee active replication and full ACID-compliance of distributed transactions without two-phase commit.

TiDB, CockroachDB, FaunaDB, Vitess are a few of the leading NewSQL databases. Each database implementation has its own take on how to ensure strong consistency with scalable architecture. Let’s dive into them.

TiDB

TiDB is an open source database that supports distributed HTAP (Hybrid Transactional and Analytical Processing) and is compatible with MySQL. PingCap is the company which backs TiDB. The initial version was released in Oct 2017 and its current stable version is 2.1.9.

TiDBs salient features are:

  • Hybrid — TiDB supports both analytical processing (OLAP) and transaction processing (OLTP) workloads. This means there is no need to do ETL from application transaction database to analytical database. TiDB’s storage layer TiKV is accessed by TiDB clusters for OLTP and by natively supported TiSpark for OLAP.
  • Cloud Native — TiDB is designed to operate in the cloud (public, private, and hybrid) and its storage layer TiKV has been accepted as a sandbox project by the Cloud Native Computing Foundation.
  • MySQL Compatible — Applications can treat TiDB as a MySQL server and connect using its existing client libraries without any change from the application side.
  • Less ETL — Since TiDB operates as both OLTP and OLAP, there’s no need to do ETL from OLTP to OLAP.

CockroachDB

“CockroachDB is a distributed SQL open source database built on a transactional and strongly-consistent key-value store. It scales horizontally; survives disk, machine, rack, and even datacenter failures with minimal disruption and manual intervention; supports strongly-consistent ACID transactions; and provides a familiar SQL API for structuring, manipulating, and querying data.”from the Cockroach GitHub.

Cockroach Labs is the company backing CockroachDB. The initial version was released in Sept 2015 and its current stable version is 19.1.1.

CockroachDB’s salient features are:

  • SQL Compatible — Though CockroachDB has distributed, strongly-consistent, transactional key-value store underneath, its external API is standard SQL compatible.
  • Multiple-Active Availability — CockroachDB’s availability model is termed as “multi-active availability”. Multi-active availability provides benefits of reading and writing every node in a cluster without conflicts. Multiple replicas run identical services, and traffic is routed to all of them. If any replica fails, the others simply handle the traffic that would’ve been routed to it.
  • Online Schema Changes — CockroachDB provides a built-in online schema changes feature; a simple way to update a table schema without imposing any negative consequences on an application. Changes to table schema happen while the database is running. Schema change runs as a background process without holding locks on the underlying table data.This allows application queries to execute normally without any effect on read/write.

FaunaDB

“FaunaDB is a modern distributed operational database for cloud and container-centric environments. It is the world’s first commercial database that is inspired by Calvin, a strictly serializable transaction protocol for multi-region environments.”From the FaunaDB website.

Fauna is the company backing FaunaDB and they have on-prem, cloud, and serverless offerings of FaunaDB.

FaunaDB’s salient features are:

  • Active-Active — FaunaDB supports a masterless, multi-cloud, active-active architecture which helps applications with 100% DB uptime.
  • Multiple Models — FaunaDB can manage multiple data models such as relational, graph, and document.
  • Data Temporality — FaunaDB provides a snapshot-based storage engine that retains historical data for a configurable period and permits correction of data errors in snapshots.
  • Horizontal Scalability — FaunaDB supports horizontal scalability allowing you to add and remove nodes without interrupting application service within the same site or across global data centers.

Vitess

“Vitess is an open source database clustering system for horizontal scaling of MySQL through generalized sharding.” From the Vitess GitHub.

Vitess was born out of YouTube’s scaling needs and currently supports its backend. PlanetScale is the company backing the open source project. The current stable version of Vitess is 3.0.

Vitess’ salient features are:

  • Scalable MySQL — This brings in all the features of SQL (JOINs, indexing, aggregation, etc) with all the benefits of NoSQL.
  • Lightweight Connections — Compared to MySQL connection size, Vitess makes very lightweight connections allowing it to scale easily.
  • Topology Service Topology Service is a metadata store (ETCD or Zookeeper) that contains information about running servers, the sharding scheme, and the replication graph. Because of Topology Service, the cluster view is always up-to-date and consistent for different clients.

The Future of NewSQL

Just like how NoSQL gained momentum earlier in the internet era, the NewSQL databases discussed in this blog are gaining momentum and have a lot of potential in the public cloud era. Hope this blog provides a high level overview of NewSQL databases to help get you started on this journey!

***

DISCLOSURE STATEMENT: © 2019 Capital One. Opinions are those of the individual author. Unless noted otherwise in this post, Capital One is not affiliated with, nor endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are property of their respective owners.

--

--

Gokul Prabagaren
Capital One Tech

Master Software Engineer @ CapitalOne. Developing and maintaining Code/Infrastructure of Apace Spark Applications for CapitalOne’s Credit Card Earn Engine.