Dealing with failure in cryptocurrency

Vlad Zamfir
Mar 27, 2017 · 11 min read

There are many ways that a blockchain can fail. It’s always good to think about what failure looks like because it lets us see 1) how bad it might be, and 2) how it’s possible to recover from failure and 3) how can failure be prevented.

Knowing how things can fail and what it is possible to do to recover puts us in a position where we can keep our cool — both during failure and while contemplating the possibility of failure. I feel like this might be important generally as a life lesson, but here is a more economic example: If you’re buying insurance (which mitigates failure, in a way) or paying for security (which prevents it), you may be willing to pay more if you have more fear of the outcome being insured against (or supposedly being prevented).

There are (at least) two classes of failures in cryptocurrency:

  1. Governance failures
  2. Technical failures

I’m going to talk about the “governance layer” as a kind of a catch-all term for anything that may influence how the protocol changes over time. It’s also the governance layer’s job to make sure the assumptions required to show (in theory) that a protocol behaves as advertised are actually true (irl).

This blog post is mostly about dealing with a technical failure in a blockchain by using the governance layer. It (maybe) can achieve consensus on a soft fork/hard fork that can be used to recover from a technical failure. Governance failures can be pretty bad because governance may be required to recover smoothly from technical failure.

As a prescription for governance failures, I suggest participation by members of the community in the governance process (just fix it), or a splitting from the community (give up and get a fresh start). Or you might want to do something else while you wait it out. You might make this decision as an individual, or as a coalition/cartel. But governance in practice is (maybe?) a technosociopolicial problem. It’s not necessarily easy!


How can a blockchain fail?

  1. Reversion of blocks expected to be consensus (safety failure)
  2. Consensus on invalid blocks (safety failure)
  3. Unavailability of consensus on new blocks (liveness failure)
  4. Censorship of transactions/blocks (liveness failure — selective use of 3.)
  5. Block unavailability (liveness failure)

I’m going to go over each of these failures one at a time, and explain what it is, how to recover from it, and how to prevent it for a proof-of-work blockchain architecture. There are going to be similarities and differences in proof-of-stake, but I want to keep it short.

Technical note: for 1) to be general, it should be written/understood as something closer to “if two nodes have distinct competing blocks and are each confident that their block is consensus, then there is a consensus safety failure. We additionally will allow these two nodes to be the same node at two (not necessarily immediately) consecutive states of the protocol”. A definition fitting this description should (I’m guessing) be able to capture consensus safety in both proof-of-work and proof-of-stake blockchain protocols, and also in traditional consensus protocols.

Reversion of consensus blocks (51% double-spend)

What is it?

Imagine: After waiting for more than 6 (or 200) confirmations, your confirmed transaction becomes unconfirmed.

This is normally understood to be the result of an attack by an adversary with a majority of the hashrate* because the only way to reverse a block in proof-of-work is to present a heavier chain without that block.

*Technical note: It can also occur due to network asynchrony.

<insert picture of 51% attack>

How does a blockchain recover from that?

If a large number of blocks are reverted, then perhaps the damage would be high enough that it justifies attempting to recover the original blockchain’s transaction history.

If a small number of blocks are reverted, then perhaps the cost was not too high and the network won’t mind the reversion.

The simple proposal:
A hard fork can introduce a checkpoint in the blockchain above the block where the new (presumably attacking) heaviest chain forked the original history. This would let all clients who install this hard fork remain on the original chain.

This proposal can be hard to implement if the community cannot come to broad agreement about which fork appeared first.

The more ambitious proposal:
Add a “non-reversion rule” to the protocol, which has nodes go to the longest chain starting at the block 6 (200) confirmations from now. Now an attacker cannot ever cause any protocol-following nodes to revert more than 6 (200) blocks. This would provide a PoW blockchain with subjective finality.

Here, an attacker won’t be able to double spend, but if they can make sure that two nodes see two distinct 6 (200) block forks before seeing any other can make these clients permanently diverge. However, waiting for enough PoW confirmations is meant to prevent this problem — and as long as the network isn’t too asynchronous, it does work.

(Vitalik’s solution of having a subjective discount on a blockchain’s score is a more “continuous” or “smooth” alternative to the non-reversion rule given above.)

How do we prevent this from happening?

Conclusion:

It seems like it would likely be more efficient to recover from a reversion attack than to prevent one, particularly in settings with well-financed adversaries and low governance costs.

Consensus on invalid blocks

What is it?

How does a blockchain recover from that?

If the community uses enough full nodes to provide enough services, then there will be an economic incentive for the miners to mine on a chain that is accepted by full nodes. This incentive must be high enough for miners not to willingly and knowingly mine on an invalid blockchain. It is up to the governance layer to make sure that this is the case — if it fails, then we may see consensus on invalid blocks.

Light clients do not validate blocks, and so must be helped extra-protocol. This can be done with checkpoints. However, the ideal thing to is for the governance layer to cause miners to produce a valid longest chain.

How do we prevent this from happening?

It may be realistically impossible to prevent 100% of miners from ever mining invalid blocks, but this might be a case where recovery due to good full node behaviour is as good as prevention.

Conclusion:

Unavailability of consensus on new blocks

What is it?

This can be because no new blocks are mined (difficulty too high, perhaps), or because no one can mine on top of the longest chain (network asynchrony, perhaps).

How does a blockchain recover from that?

The governance layer has to make sure that there is enough mining power mining on the same chain, and that the chain is being propagated to clients (that blocks are being found, and being propagated fast/well enough to allow them to be chained).

How do we prevent this from happening?

Conclusion:

As with the “invalid blocks” failure, prevention and recovery mechanisms are the same: making sure that there is enough honest hashpower on a well-enough-connected network

Censorship of transactions/blocks

What is it?

How does a blockchain recover from that?

After censorship is detected, the governance layer needs to decide on a course of action: ensure that a majority coalition which does not censor takes power, or do nothing and accept the censorship (wait it out, maybe).

If a majority coalition doesn’t censor, then miners who refuse to mine on blocks who don’t follow the censorship policy will have their blocks orphaned. If a majority does censor, then miners who are not following this strategy are having their blocks orphaned.

There are many ways that the governance layer can consider making sure that there is a non-censoring majority coalition. They can add honest hashpower to the network. They can persuade or bribe existing miners to stop censoring. They can change the hashing algorithm in a way that obsoletes ASICs, in an attempt to make it easier to dislodge the censoring cartel.

How do we prevent this from happening?

Conclusion:

Block unavailability

What is it?

An unavailable block is scary because we don’t know whether or not an unavailable block is valid.

How does a blockchain recover from that?

If the community uses enough full nodes to provide enough services, then there will be an economic incentive for the miners to mine on a chain that is accepted by full nodes. This incentive must be high enough for miners not to willingly and knowingly mine on an unavailable blockchain. It is up to the governance layer to make sure that this is the case — if it fails, then we may see consensus on unavailable blocks.

Light clients do not ensure that blocks are available, and so must be helped extra-protocol. This can be done with checkpoints. However, the ideal thing to is for the governance layer to cause miners to produce a valid longest chain.

How do we prevent this from happening?

Conclusion:

By the way, this becomes more challenging + interesting in blockchain sharding.

Closing thoughts:

Of the other failure modes, two can be prevented/recovered from by enough use of full nodes (validity and unavailability). “Unavailability of consensus on new blocks” is hard to see how an attacker would pull off (in a well-enough-behaved network) or benefit from. While preventing and recovering from censorship is relatively very difficult (the “governance layer” has to make sure a majority of miners don’t collude to censor, even though it may be in their incentive to censor).

If we know we can rely on the governance layer to benefit from this kind of extra-protocol finality, then perhaps we shouldn’t pay miners so much that we can be convinced that 51% attacks never happen no matter what reasonable cost. The cost of a 51% attack to the community is mostly known in this case: it is the cost of making a governance decision on which chain came first, and the cost of disruption to business-as-usual until this decision is made. This is hopefully much less than the cost to the 51% attacker (although that can’t necessarily be guaranteed).

I hope blockchain communities will not be intimated by threats of 51% attacks, or by the uncertainty around what happens if there is a 51% attack. It is very possible to recover from 51% attacks. I think we could all be less impressed, in this context, about what security PoW mining provides against 51% attacks relative to a context where the governance layer absolutely cannot be counted on for extra-protocol finality.

I think miner entrenchment is big problem in the public blockchain space. As far as I’m concerned, our collective fear of 51% attacks and willingness to collectively pay ridiculous amounts of money for “security” in order to prevent attacks is a much bigger threat to the success of cryptocurrency than 51% attacks themselves actually happening. I think our fear is giving miners more clout in governance than they would have if we more informed about what failure in cryptocurrency looks like, how it can be recovered from, and how it can be prevented— I hope this blog helps!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store