The data architecture of most businesses today is built out of multiple specialized data systems — relational databases, NoSQL data stores, OLAP data processing engines, each serving specific product features and business teams.
With the explosion of NoSQL products in the market, it has become increasingly difficult for businesses to make buying choices. Each NoSQL technology is meant to handle specific categories of workloads and the NoSQL market is a far cry from the “one-size-fits-all” relational database era, dominated by only a few RDBMS products.
All NoSQL and NewSQL products have a distributed architecture — data is sharded across machines to improve performance. Each NoSQL data store optimizes for specific categories of workloads through its design choices:
- Indexing scheme & Disk layout (part of their storage engine), and
- Data modeling semantics — key-value, document, column-family, graph, time series, etc.
The NoSQL data stores brought high scalability (with their distributed architecture) and workload-specific non-relational schemas, but forgo transactions across shards (aka distributed transactions) as atomicity and isolation (the A,I of ACID) in distributed transactions is tricky and bears performance costs. The burden of handling distributed transactions fell onto the application tier.
The NewSQL databases aim to combine the benefits of horizontal scalability (found in distributed NoSQL data stores) and relational semantics, with distributed transactions and SQL. The NewSQL products include Google Cloud Spanner, YugaByte DB, CockroachDB, TiDB, etc.
Another emerging distributed data store category is multi-model — supporting a variety of data modeling semantics (relational, document, key-value, graph, etc.) using multiple query engines & APIs and common underlying storage engine. These products include Azure Cosmos DB, FaunaDB, ArangoDB, FoundationDB, etc.
The NoSQL and NewSQL products have some common headline features:
- Performance — Multiple Read Consistency Levels, Globally Distributed Read Replicas, Multi-Master Replication.
- Availability — Automatic Failover & Self-Healing.
- Scalability — Automatic Sharding & Rebalancing/Resharding.
The NewSQL products, in addition, support distributed transactions, and to guarantee atomicity and isolation (the A,I of ACID) incorporate additional mechanisms:
- Atomicity — Two-Phase Commit (2PC)
- Isolation (aka Global Serializability) — Snapshot Isolation level via globally consistent snapshots, and Serializable Isolation level via pessimistic concurrency control like Strong Strict Two-Phase Locking (SS2PL), or optimistic concurrency control like Serializable Snapshot Isolation (SSI), etc.
But common headline features doesn’t mean these products are all the same. All NoSQL and NewSQL products are cloud-native, with most of them offering DBaaS (database-as-a-service) options backed by SLAs (service level agreement).
Open Factory provides Data Engineering consultancy at various forums. Drop us a message on our website if you need help with your data strategy.