Do All Roads Lead to Rome?
Evaluating complex systems characteristics is always a hard task, because the more complex a system is, the more different it will be from any other system, even if their stated goal is the same. In the distributed systems’ world the word du jour is distributed ledger (often represented as DLT, where T stands for Technology).
DLTs come in a plethora of flavours. For one we have the distinction of permissioned vs permissionless to differentiate between those systems where there’s a guardian that lets people join the network vs the ones where everyone can join. Then we have the blockchains and DAGs to differentiate between the systems where each new state (block) has a single parent vs the systems where a state can have multiple parents (Directed Acyclic Graph)
Note: if we want to be really pedantic we would differentiate between Public and Private systems (i.e. where authentication is needed) as well as Permissionless and Permissioned systems (i.e. where authorization is needed). Furthermore entities involved in various distributed systems have different roles, some of these roles requiring authorization (thus we would talk about Public Permissioned systems — Ripple and Stellar are prime candidates for this; anyone can join and send transactions, but the validators are ‘handpicked’) Most of the times, unless specified, for brevity we will consider all Public systems Permissionless and all Private systems Permissioned
In a distributed system, by definition, there are multiple components/entities connected through a network — this connection is also a separation point as information requires time to travel between them. The consequence of this delay in information propagation is that not everyone has access to the same ‘facts’ at the same time. Besides the latency, another important factor is that the information does not originate in the same point, thus information propagation is a rather entropic process.
Because of these different world views people have come up with algorithms to ensure that everyone has eventually the same view of the facts and maybe even the order in which they occurred (for independent events the order in which they happen is not important, but for interdependent ones, it is). As these algorithms try to establish a consensus of the view of the world, they are aptly named consensus algorithms.
Aspects to be taken into consideration when designing these consensus algorithms should be different when we talk about permissioned systems from the ones in permissionless systems. In permissioned systems, there can be, in theory, a vetting procedure for nodes to join the network, thus we can weed out the bad entities that are trying to join just to wreak havoc. And in permissionless systems the capability of one actor to behave contrary to everyone’s common interest (maybe even contrary to its own interest) is an aspect that can (and should) be taken into consideration.
There are quite a few consensus algorithms that have been engineered to handle non-adversarial setups (where all the actors involved behave correctly, but sometimes maybe their network connection is unstable or maybe their client hard-drive crashes). These algorithms are widely used in the design and operation of the first embodiment of the distributed technology — distributed databases — amongst which we’ll mention Paxos and Raft. A lot of the existing distributed databases in use today employ, under the hood, one variant of Paxos, Raft, ZooKeeper, Chubby and a few others.
These consensus algorithms take into consideration one or more aspects of the constituent elements — time they take to respond, how many of them responded, what they responded to.
In an adversarial setup (most of the times called byzantine), entities participating in the distributed system will want to change everyone’s view of the current state, some correctly (I’ve sent some money to this other entity, my balance decreases by the amount sent), some falsely (I’ve sent some money to this other entity, now I have nothing, but let me pretend to this 3rd entity that I still have some). Moreover there can be cartels of malicious actors that each support other members of the cartel’s false claims. To complicate things further it might not be obvious which of the entities are ‘the bad guys’. This is why reaching consensus has to take into consideration some other aspects, as well as make assumptions about the number of malicious actors in relationship with the rest of the network (e.g. at least 50% + 1 of the entities in the network are correct or, for some other algorithms, less than ⅓ entities are ‘bad behaving’ actors).
In almost all instances the consensus algorithm employed involves at least one (possibly more) entity being tasked to propose the next state of the system (that is, to propose a new block that includes valid transactions not already included in previous blocks) and a rule to decide in case there are conflicting proposals. Another common feature of these algorithms is that they rely on a pseudo-random value to decide who that entity is going to be.
In some cases this algorithm is a simple lottery (finding a particular, hard to find, hash). In this instance the more hashes one entity can crunch (with the associated energy expenditure) the more likely is that it will stumble upon the ‘correct’ value. The rule can be ‘the longest chain wins’ or ‘chain with most expended work’. When the algorithm is this mathematic lottery, called PoW (proof of work) and the rule is ‘longest chain with most difficulty is the canonical one’ we call it Nakamoto consensus.
One other approach is to have a random draw from a list of eligible entities, where each entity’s chance of being chosen is proportional to some factor (most often the number of tokens or money it possesses) — the bigger the factor, the more likely for the entity to be chosen. We call this approach PoS (proof of stake)
Entering the Beauty Contest
We’ve done this long exposé on blockchain and DAGs to set the stage and outline reasons why we won’t compare apple and oranges. It makes no sense to compare performances of a permissioned system with a permissionless one because, by design, a permissioned system probably does not have to deal with adversaries. Consensus algorithms in permissioned systems do not need to take into account the fact that some entities in the network might lie about their (or their ‘friends’) actions. That’s why we won’t talk about performance and capabilities vs systems like Hyperledger or Ripple.
Likewise DAGs are a particular set of systems. They tend to be somewhat centralized in nature, in absence of an overly mature and large network required for them to function. Although unproven yet at a large scale, they have the potential to outdo any blockchain. At the same time the current uses and designs do not consider them appropriate for large value transactions. These are the reasons why we will not talk about capabilities or performance vs systems like IOTA, Raiblocks, Nano or Byteball. In this category we also find Hedera Hashgraph, which, on top, is partially patented.
Lots of blockchain projects were announced in the last year, some just available on (white)paper and maybe Github, others processing live transactions. We won’t be considering tokens as these use the infrastructure of an underlying blockchain (mostly Ethereum) to operate. That leaves us with
- Ethereum 2.1 -> Shasper (Casper + Sharding)
This list can be further broken down in two based on the type of consensus algorithm used. Bitcoin, Litecoin, Ethereum 1.0, Zcash use PoW, while the rest use PoS based consensus.
While from a functional point of view, PoW is a reliable approach (Bitcoin has been operational for almost 10 years now), the common critique is that it consumes (some would say waste) lots of energy (currently — Oct 2018 — more than Austria). Adding the other PoW coins/blockchains probably results in an energy consumption that’s close to 50% that of UK. Putting our environmentalists hats on, we have to favor blockchains that consume less energy, thus let’s run our comparison mostly on the PoS chains. In this way a common feature base might emerge.
Because randomness and determinism do not go well hand in hand, some PoS systems have chosen to either completely eliminate the randomness or employ randomness only once in a while, in periods (called eras or epochs).
The systems where the determinism (in terms of who is deciding on the next state) is almost complete and, most important, unchanged for large periods of time, are usually relying on a ‘vetted’ list of proposers, each of them chosen through votes of the ‘general population’ (that is, each node votes for one or more proposers from a list, their votes being weighed by the amount of tokens they possess). These systems are called Delegated Proof of Stake (DPoS) and probably their most known representatives are EOS and Tron (an interesting hybrid is Tezos which allows delegation, but does not mandate it). Currently operating DPoS systems rely on these delegates (usually between 20 and 30) to propose new blocks and forge a single canonical chain. Their downside is that it’s a pretty centralised setting and, as already speculated, quite easy to ‘cartelize’ (e.g. currently 12 of the 21 EOS block proposers -that’s how these validators are called in EOS- are from China; this could give an upper hand to the Chinese government should they wish to censor transactions).
Safety is the property of a distributed system that guarantees that nothing bad can happen. In other words safety is the property that ensures participants that no two conflicting transactions or states can ever be considered valid — it’s either one or the other, never both. But safety also ensures that, for instance, there are no deadlocks — where two entities wait for each other. This is why safety is the most important property of a distributed system.
Other aspects of safety that do not hold so much importance in distributed systems are those of ordering — safety requires first-come-first-serve, which might not be always achievable in a distributed system as time is a relative concept, particular to each participant.
Liveness is the property of a blockchain to produce new blocks regularly (in most cases even if there are no new transactions — this is a major difference from DAGs where lack of transactions might mean some things remain in limbo). It can be seen as a heartbeat that proves that the chain is still alive, hence ‘liveness’. Provided that the blockchain’s protocol is designed to ‘spit’ a new block every S seconds, this is easy to achieve in absence of transactions. However producing blocks at regular intervals in presence of (potentially conflicting) transactions requires agreement between entities involved and some of the consensus algorithms are requiring several network round-trips that can affect this S. In PoW environments, through embedded protocol parameters, if the ‘pulse’ of the network decreases the chances of winning the lottery is increased for each participant.
It’s worth mentioning that from a crypto-economic standpoint nodes are incentivised to keep this pulse up (produce blocks) by network granted block rewards and/or fees from the included transactions.
Finality (and time to finality)
Finality is the ‘point of no return’ — once a final state is reached, the immutability kicks in and no change is possible. While safety can be interpreted as ‘only good/expected things happen’, finality is ‘an expected thing will eventually happen’. Time to finality is the amount of time (or maybe the number of blocks/new states) required for a block included in the history to become final.
It’s said that finality is needed in order to have double spend protection. And that’s true, but only when thinking about a blockchain transaction in a commerce setting. That is, by itself a blockchain transaction (i.e. one included in a block) will never conflict with another blockchain transaction (i.e. they can’t both spend the same money or tokens), but two transactions, one of which is not yet included in a block, might conflict. Even worse in this case, should a reorg happen (i.e. a chain — that’s been proven based on the consensus algorithm used to be the canonical one — is being published), the roles could change and the transaction included in a block can become ‘free-floating’, while the one that was previously ‘floating’ is included in the new version of the chain.
Here’s a fact that people don’t realize: PoW chains do not offer finality, the ‘comfort’ on the receiver’s side is just a probability function — the more blocks (confirmations) there are, the more unlikely it is for a transaction to be reversed. But unlikely does not mean that the probability is 0. So, in theory, even the best equipped (in terms of hashing power) chain — BTC — can have reorgs, but the probability of winning the state-run lottery 3 times in a row is perhaps bigger than the probability of having more than a few blocks being invalidated.
Depending on the cryptographic primitives involved, the bootstrapping of the network (the genesis) or some intermediate phases, might require arrangements done ‘out-of-band’ to ensure that bad actors can’t influence the operation or the behaviour of the chain. This is particularly important for zk-SNARKS based chains. This trusted setup phase is also employed by other chains usually at the beginning of a period or an era, to eliminate the need for the participants to interact during some of the steps in the respective era.
Interactive vs non-interactive protocols
In a live system, assuming that there’s agreement on the current state, reaching agreement on the next state means that some nodes need to send some messages and some others to receive and act on them. In a non-interactive protocol, responses (beyond acknowledgements that a message has been received) to a message do not impact the next round of messaging (i.e. there’s no calculation of any kind expected from the receiver of the message in order for the next step to be kicked-off), while in an interactive protocol they do.
For instance asking participants (nodes) to sign a message is an interactive protocol, while broadcasting a block is non-interactive. Interactive protocols are tricky because on the one hand they might be required (e.g. validating a block by signing it), but on the other hand they are a perfect tool for byzantine participants, should they want to impede new state agreement, by refusing to act or by acting with a long delay.
Because permissionless distributed systems are subject to adversarial conditions, one consensus algorithm design decision is around protection of the system from bad actors trying to influence outcome (such as discarding valid blocks or transactions) by bribing consensus participants. All algorithms have thresholds above which the correctness is ensured regardless of the efforts of these bad actors and they range in general from 50% + 1 to 66.66% + 1, with the latter figure more common in PoS systems.
It’s important to note that in PoS systems this threshold can refer to either percentage from the total number of nodes or, in many cases, percentage from the total available stake in the system (that is — total circulating supply; how many tokens — which are equal to votes — are there). Resilience usually refers to this threshold of bad (or good, if seen from the other side) actors in the network, above which they can impact various functions (liveness, safety, finality etc).
Static / Adaptive corruption model
As outlined above, adversarial systems, being subject to bad actors, need to employ mechanisms that reduce capability of these actors to influence other, fair, nodes. This influence can be involuntary (e.g. DDoS-ing so that it can’t respond, eclipsing — giving it a distorted view of the network) or voluntary (e.g. bribing). Knowing in advance who is going to be in charge of a particular decision (assembling a block or vouching for it), gives an edge to bad actors and the earlier in advance they know, the more time they have to prepare for the attack.
To use a terminology borrowed from “Formal Barriers to Longest-Chain Proof-of-Stake Protocols” we would say that in D-Locally/D-Globally predictable chains, the smaller ‘D’ is (D represents the number of blocks in advance one knows who will be appointed to propose or to validate a block), the more protected from actions of adversaries. In this framework global predictability is preferred to reduce network communication (and thus latency) and to overcome any (or most) adaptive corruption models it’s also preferred that D is 1.
Sybil resistance / Identity
Fundamentally both staking (of tokens) or mining (which amounts to staking computing power) are mechanisms which linearly increases anyone’s expected output, be it rewards or fees or other incentives, with their input (that is, put twice as much computing power in, get approximately twice as much rewards; put twice as much stake, get picked to propose blocks roughly twice as much) — as such they can be considered sybil resistance schemes. Furthermore there is plenty of literature that analyses all this setup from a game-theoretic point of view showing that collusion between participants does not pose a threat (and might even be dis-incentivised through use of slashing conditions or other ‘punishments’) as long as the safety thresholds are not exceeded.
A distributed system can make assumptions (or even require) that the clocks of all participants are synchronised. While few blockchains have strong synchronicity requirements, the vast majority requires partial synchrony, while just a couple work well with totally asynchronous environments.
Divide et Impera
All the aspects mentioned thus far tend to deal with functional characteristics, but what about non-functional ones, such as throughput?
In all distributed systems the throughput is limited by the time required for the whole network to agree on the next state. Moreover, the number of participants in the network directly impacts the time required to reach an agreement. Thus it would make sense to split the whole network into smaller groups, assigning each participant to a particular group.
In this scenario, as long as interactions (transactions) happen only within each group, the things are straightforward: the agreement on the next state can be reached in each group and then the global state is just the union of all, smaller, group states. But when we also consider instances where two groups need to interact (by the virtue of one participant in group A wanting to send tokens to another participant in group B), things elevate in complexity and it’s required to either introduce a higher order entity or develop an algorithm that allows this (essentially simulating a two phase commit in the “classical” database management systems). In distributed systems these groups are often called shards and the technique of splitting the participants into shards is called (of course) sharding.
To complicate things more, we can establish three types of sharding, based on each of the three constituents of a distributed system — storage, computing and communication. As a result we would end up with state sharding, transaction sharding or network sharding.
State sharding means that each group (shard) is responsible of storing their own chain. Transaction sharding would mean that transaction validation is done in each group. Finally, network sharding means that the communication of participants is limited to peers in their own group.
We’ve gone through all this discussion about aspects to take into consideration when looking at the new paradigm pushed forward by blockchains, so that we have a few common reference points. In a next instalment in this series we will look at the PoS chains through this lens and try to compare them.
 https://arxiv.org/pdf/1809.06528.pdf — “Formal Barriers to Longest-Chain Proof-of-Stake Protocols” by Jonah Brown-Cohen, Arvind Narayanan, Christos-Alexandros Psomas, S. Matthew Weinberg