Logs with Rules

A TL;DR on blockchains

Disclaimer: I work at Gem — all opinions are my own.

People often ask me what a blockchain is. That’s not surprising; as diverse industries gain interest in the topic, blockchain technology has continued its meteoric rise in awareness and popularity into 2016.

‘Blockchain’ interest over time (Google Trends)

What is really surprising is that among those who ask me that question, most of them have asked it before. More than comparable emerging technologies, it seems like the blockchain conversation is muddy and confusing. Eager, intelligent people can spend a week researching the topic and still feel like they have to ask, “So, what is it?

We don’t have a good way of talking about “blockchain” yet.

The term is new, in·con·sis·tent·ly defined, and under constant re-appropriation. This confusion around what exactly “blockchain” means is understandable for a couple of reasons:

  1. The origin of the term is questionable and (as has often been noted) is not from the Bitcoin white paper.
    There is no authority to which we can appeal for a clear definition, which isn’t a problem in itself, but has contributed to widespread equivocation around its use.
    For an examination of “blockchain” as interpreted from the perspective of Satoshi Nakamoto’s paper on Bitcoin (the first modern blockchain and the project that popularized the term) check out David Hudson’s well-done post on the topic.
  2. The concept of a blockchain is easily conflated with related components of blockchain-based systems.
    Unlike unclear etymology, this problem is difficult to dismiss and is largely a matter of fuzzy semantics — for example:
    Blockchain networks are peer-to-peer networks of computers (like e.g. BitTorrent) that seek to collectively agree on what data belongs in a given blockchain. 
    Blockchain nodes on that P2P network typically keep a full copy of the blockchain in a local database, but importantly:
a blockchain is neither a network nor a database.

Or more precisely, it’s not useful to talk about blockchains that way. A technical term is “useful” if its definition is clear, unambiguous, and distinct from that of other terms. Any terminology failing to meet any of these requirements will be limited in its ability to convey meaning by its tendency to introduce confusion.

I propose a simple, concise, and useful definition that clarifies what a blockchain is and what a blockchain needs:

Blockchains are logs with rules.

That probably doesn’t make sense right away, but that’s okay. Keep reading.


A blockchain is a linear, append-only data structure.

That’s pretty concise and pretty simple. For the readers who are unfamiliar and don’t want to click the link, a data structure is just a way of organizing data; different data structures are useful in different situations.

This is distinct from the concept of a database which is concerned with the storage of data. A blockchain can be stored anywhere it can fit: BerkeleyDB, PostgreSQL, IPFS, a JSON document, a roll of toilet paper, whatever you want; regardless of where it sits, it’s still a chain of blocks.

All that said, this definition isn’t useful; it fails the requirement that a definition be distinct from the definitions of other terms — it lacks specificity, so let’s specify.

A blockchain is a linear, append-only data structure in which state is represented by an exhaustive record of discrete events.

This is better; it conveys the idea that the state of a blockchain is a function of everything that has ever happened on it. It’s still not very specific, in fact it looks very much like a description of how data is structured in a log. Still not useful, so to expand:

A blockchain is a linear, append-only data structure in which

  1. state is represented by an exhaustive record of discrete events
  2. new events are introduced in batches (called blocks)
    This introduces the concept that events (i.e. modification to the state of the chain) are grouped together and applied simultaneously. This feature allows us a flexible ordering model and is important for achieving distributed consensus.
  3. each block contains a reference to a single parent block
    This refers to the idea of cryptographic commitments. I’ll skip the gritty details, but this constraint is what enforces the ‘chain’ part of blockchain and helps us make guarantees about the immutability (or ‘unchangeable-ness’) of the data in a given chain.

This definition is better — kind of simple, kind of concise, and kind of useful. It describes a cryptographically-enforced event log, but fails to explain one of the most interesting properties of blockchains: the way new data is introduced.

  • blocks are valid if and only if they satisfy some arbitrary set of constraints (called consensus rules)

Oh yeah, blockchains have rules!

The rules of Bitcoin state that blocks are invalid if they contain any transactions without valid signatures; the rules of Ethereum state a transaction is only valid if the sending account has sufficient ether to pay for it. There are many other rules in every blockchain system — but the rulesets can be thought of as arbitrary. All that matters is to note that some ruleset exists (even if it’s the empty set).

This feature is essential for a blockchain to be anything but an audit log, but it doesn’t really fit with our description of blockchain as a data structure. Consensus rules are usually enforced by nodes on a peer-to-peer network, but even if they were enforced by stored procedures in a chain’s database, the enforcement mechanism is an implementation detail, not an aspect of the data or its organization.

To describe this thing precisely, we need to “zoom-out.” Luckily, there already exists a concept in software design of an abstract, repeatable solution to a generic class of problem: it’s called a design pattern.

A blockchain is a design pattern based on a linear, append-only data structure in which:

  1. state is represented by an exhaustive record of discrete events
  2. new events are introduced in batches (called blocks)
  3. each block contains a reference to a single parent block, and
  4. blocks are accepted as valid if and only if they satisfy some arbitrary set of constraints (called consensus rules)

This definition could refer to nothing else but a blockchain and accurately describes any example of a blockchain that the industry has seen yet — it finally meets our usefulness requirement. On the other hand, we’ve really gone off the rails considering conciseness and simplicity so a quick reduction is necessary:

Earlier we noted the similarity between a blockchain’s core data structure and log files: both are append-only event records. Considering the requirement that each block must reference its parent as one of the rules that all-or-most blockchains share, we can distill the entire definition down to the deceptively simple, concise, and (I would argue) useful title of this post.

TL;DR Blockchains are just

Logs with Rules