Blockchain technology — a very special kind of Distributed Database

After almost 2 years of hype, some people still ask me when to use “distributed ledgers” rather than standard “distributed databases”. Other (vicious) people ask me what is the difference between “blockchain technology” and “distributed ledger technology”. So let’s clarify the conceptual & vocabulary issues that we have here.

Centralized relational databases

Relational databases (RDBMS) organize data in tables and use the SQL query language. They became the norm in the 80s. Even if their architecture evolved in complexity over time (n-tier, distributed processing, etc.) they remain essentially centralized i.e. located, stored, and maintained in a single location. This category represent more than 90% of the database market in terms of revenues and includes the most well-known vendors and systems: MySQL, Oracle, Microsoft SQL Server, IBM DB2, SAP, PostgreSQL, SQLite, Teradata, etc.

Distributed databases

Databases are distributed (DDBMS) when the storage devices are not all attached to a common processing unit such as the CPU, but are spread across a network. With the development of the internet, businesses needed solutions that could process huge amounts of structured & unstructured data, and that could scale across networks. DDBMS use consensus mechanisms to ensure fault-tolerant communications, and provide concurrency control through locking and/or time-stamping mechanisms. They come in different technology forms:

1. Peer network node data stores are systems allowing users to replicate and share files across a network leveraging peer-to-peer protocols such as: BitTorrent, NNTP, Freenet, Mnet, etc.

2. Distributed SQL data warehouses are systems designed by the major vendors (Microsoft, Oracle, SAP, IBM, etc.) to allow for the massively parallel processing of analytics-oriented tasks.

3. Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks.

4. NoSQL databases are non-relational DDBMS, horizontally scalable, designed for real-time web applications. The most well-known solutions are: MarkLogic, MongoDB, Datastax, Apache Cassandra, Redis, Riak, Google BigTable and CouchDB.

5. NewSQL databases are relational DDBMS designed to combine the best of relational databases & NoSQL databases properties (horizontal scalability & distributed processing). Examples: Google Spanner, Clustrix, VoltDB, MemSQL, Pivotal’s GemFire XD, NuoDB and Trafodion.

6. Distributed Ledgers (DL) are DDBMS that leverage cryptography to provide a decentralized multi-version concurrency control mechanism and to maintain consensus about the existence and status of shared facts in trustless environments (i.e. when the participants hosting the shared database are independent actors that don’t trust each other). Consensus is not a unique feature of DL per se: other distributed databases also use consensus algorithms such as Paxos or Raft. Same for immutability: immutable databases exist outside DL (Google HDFS, Zebra, CouchDB, Datomic, etc.). The two differentiators of DL in my opinion: (a) the control of the read/write access is truly decentralized and not logically centralized as for other distributed databases, and corollary (b) the ability to secure transactions in competing environments, without trusted third parties. Some people call this category “shared ledgers” but I prefer the term “distributed” because shared can mean “divided/split”.

6.1. The Bitcoin system was the first instance of DL, designed for one purpose: peer-to-peer bitcoin (cryptocurrency) payments. To avoid double spending, Bitcoin uses chained blocks of data (hence the “block chain”) and a proof of work consensus among other mechanisms. Bitcoin is censorship resistant, its key features are: byzantine-fault tolerant, pseudo-anonymity, auditability (public), immutability, accountability (time-stamping) and non-repudiation (signature) at transaction level.

6.2. Some systems are inspired by or somehow close to the Bitcoin system. They usually implement most of its features, but not all or with different characteristics. For instance:

  • Other cryptocurrencies implement privacy mechanisms (Zcash), or different consensus protocols such as Proof of stake, Proof of burn, etc.
  • Ethereum share many of Bitcoin features but is designed to execute programmable transactions (smart contracts)

6.3. Some systems differ significantly from Bitcoin:

  • The DL envisioned by Accenture is not immutable
  • R3 Corda is designed to operate in regulated environments with a limited number of known participants (e.g. Financial Institutions, Regulators) where BFT is not required (security is achieved differently), auditability is based on the “need to know” and consensus about a transaction is basically reduced to its validation by the two contracting parties
  • Disledgers: Distributed Concurrence Ledger is tailored for financial institutions dealing in capital markets and payments. Concurrence is an alternative to seeking consensus in distributed ledger systems and does not utilize cryptocurrencies, chained blocks, nor proof of work [Note: to me this approach looks similar to Corda]
  • HashGraph Swirlds: HashGraph is based on a “gossip protocol” where blocks are “events”: each member repeatedly chooses another member at random, and gives them all the events that they don’t know yet. As the local copy of the hashgraph grows, the member runs an algorithm to determine the consensus order for the events (and the consensus timestamps). The data structure is a directed acyclic graph, where each vertex contains the hash of its two parent vertices

6.4. BigChainDB aims to provide a scalable distributed data storage, by adding blockchain characteristics (decentralized control, immutability, and creation & movement of digital asset) to a standard distributed database. BigchainDB inherits characteristics of modern distributed databases: linear scaling in throughput and capacity with the number of nodes, a full-featured NoSQL query language, efficient querying, and permissioning.

Here is the landscape summarized (simplified):

The distributed ledgers that are “double permissionless”, such as bitcoin, are the most decentralized ones and resist to censorship. The less decentralized they are (e.g. permissioned DL in a semi-trusted environment), the closer they get to being “regular” distributed databases using cryptography. In this latter case, cryptography is used as a new mechanism to enforce auditability & accountability between peers:

— — — — — — — — — — — — — — — — — — — — — — — — — -

Now what is “blockchain technology” you might ask? Ironically, there is no consensus on the definition:

  1. Minimalists will argue it is only Bitcoin
  2. Some people think it should include any DL with chained blocks
  3. Some experts think it should include any DL with some key features: chained blocks, immutability & consensus protocol
  4. Maximalists say “blockchain technology” equals “distributed ledger technology” equals “cryptographically enabled DDBMS”. Also it is easier to use the term “blockchain” for marketing & communication purposes, even if it can be misleading…

The final equation is: bitcoin blockchain ⊆ blockchain technology ⊆ distributed ledger technology ⊆ distributed databases.

If you like blockchain debates, let me finish by sharing 4 other interesting questions:

  1. Is Bitcoin truly BFT?
  2. How does Bitcoin & other DL handle the CAP theorem?
  3. Is it possible for a DL to be at the same time decentralized, scalable and secure? [talking about trilemma, i recommend this great post about the DCS triangle]
  4. Is a private blockchain without token really more efficient than a centralized system?

Sources of inspiration for this post: Gideon Greenspan, Richard Gendal Brown, Pascal Bouvier (who uses a different taxonomy), Dave Birch, Colin Platt.