What makes blockchains secure? (2/5)

Tarun Chitra
Gauntlet
17 min readMay 22, 2018

--

In the first post of this series, we discussed the definition of Nakamoto consensus and how the addition of a block to a blockchain could be viewed as a voting mechanism, in which the proof of work or proof of stake mechanism ‘elects a leader’ who is allowed to add a block to everyone else’s chain. This led to a natural connection to Byzantine agreement algorithms and a discussion of how some of the security analysis of Byzantine agreement algorithms could apply to blockchains that use Nakamoto agreement. Today, we are going to look at how the formal analysis of blockchains and Byzantine protocols involves constructing an idealized simulation of an instance of a blockchain system. These simulations are represented by environments, which correspond to the state of all participants, adversaries, and hash functions needed to simulate a blockchain. One can think of an environment as a class that contains all of the state needed to simulate a blockchain from the beginning, as well as functions that mutate this state to advance blockchain growth within the simulation. The simulation itself involves constructing an environment and then repeatedly calling the ‘increment time step’ function, which will either produce a block and/or disseminate it through the miner network. These idealized simulations are similar to discrete event simulations, such as Markov Chains or simulations of generative probabilistic graphical models, allowing one to transfer mathematical tools for analyzing convergence from these models to idealized blockchains.

The recent Avalanche paper by ‘Team Rocket’ (with some help from Emin Gün Sirer, et al.) made it clear to me that the methods used for formally analyzing blockchains are basically all the same. If you dive into the Bitcoin Backbone Protocol paper, Algorand, or dfinity-network, you will find that all authors end up constructing what we will define below to be an environment. As the academic literature can be contentious, with authors arguing that their methodology for analyzing idealized blockchains is superior to others, it can be easy to assume that the analysis of blockchain security is model dependent. By instead focusing on their similiarities, we might be able to construct a new protocol that is able to take advantage of the best pieces of the aforementioned protocols. I want to leave the reader of this post with a question to think about: Is it possible to combine environments (akin to boosting in machine learning) and their associated simulations, such that a system can provide the “best” subset of axioms possible? This question stems from Team Rocket’s usage of metastability to change the type of consensus mechanism used depending on the state of the environment, which I view as a first step toward ‘boosting’ the security and performance of blockchains.

Team Rocket, awaiting an Avalanche

In order to make the jump to understanding environments, we will first have to understand what the different requirements for security are and how one constructs Byzantine adversaries. The discussion below aims to be generic enough to accomodate the security guarantees and adversaries defined in all of the aforementioned protocols. Buckle up!

Most of the academic literature on blockchain security is focused on proving properties or axioms that a protocol must satisfy to be considered secure. The literature on Nakamoto consensus has converged on two sufficient (but perhaps not necessary) properties for a blockchain to be secure:

  1. Persistence: Once a transaction has been accepted in the blockchain by a large number of miners, then the probability of it being reverted in a future fork decreases exponentially
  2. Liveness: Any transaction created by an honest node will eventually be accepted by every honest miner

The Team Rocket paper was able to construct an algorithm that is a hybrid of Byzantine consensus algorithms (such as Algorand) and Nakamoto consensus algorithms by replacing the persistence axiom that other protocols require for security. The advantages of this algorithm, Avalanche, are:

  • It is much simpler to implement and reason about than Byzantine agreement and Byzantine-Nakamoto hybrid agreement
  • It appears to best Algorand by an order of magnitude in terms of transaction latency — time until the network confirms a transaction — and transaction throughput, which is the number of transactions that can be processed per second.
  • From a probabilistic perspective, there is an easy-to-describe phase transition in the parameters of protocol. This transition corresponds to points at which probabilities that decay exponentially (such as those implied within persistence and liveness) begin to have non-exponential decay.

Avalanche achieves this by replacing persistence with a seemingly weaker axiom called safety, which states that:

P1. Safety. No two correct nodes will accept conflicting transactions.

The appearance of a qualitatively different phase transition than that found in Snow White or the Bitcoin backbone protocol suggests that the choice of axioms can lead to very different and possibly non-universal behavior [0]. Plus, while it is clear that persistence is a stronger axiom to insist upon, it is isn’t totally clear why one would choose one axiom over the other. I don’t have an answer to this question and I suspect that most academics and practitioners in the field are also in the same boat. This is because the axioms that one requires a distributed system to satisfy are often quite disconnected from results that actually guarantee security. Security of distributed and financial systems is guaranteed in many ways; here are a couple examples to give you a flavor for what people would ideally aim to prove:

  1. Honest behavior by most participants leads to an 𝝴-approximate Nash Equilibrium
  2. In the theory of auctions, one often aims to prove that an auction is dominant strategy-incentive compliant (DSIC), which means that you cannot increase your utility by behaving in a dishonest and/or unexpected manner [1]

In practice, these are very hard properties [2] to prove of a distributed system, and one is left with the following roadmap for provable security:

  1. Define the algorithms involved in a protocol
  2. Construct a model of an adversary that is aiming to subvert this protocol
  3. Construct a simulation that includes a game played between honest players and the model adversary
  4. Show that the axioms that you desire (e.g. persistence, liveness, safety, agreement, etc.) hold with high probability for most instances of this simulation
  5. Argue that the axioms that you have chosen lead to an approximate Nash Equilibrium and/or that there is not a unique Nash Equilibrium (but it is okay!) [3]

As you can see, the last step is where we transition from speaking about the formal security of a protocol (e.g. proofs that show that a protocol satisfies some axioms) to a more qualitative argument that says that these axioms are ‘good enough.’ [4] The rest of this post is going to ignore step 5 and focus on providing an exposition about steps 2 through 4. We will come back to step 5 in later blog posts, but for now we will try to answer the following questions:

  1. How are adversaries constructed? What properties do they need?
  2. How does one construct a game in this setting?
  3. How do we ‘simulate’ an instance of a protocol? For instance, what does Bitcoin look like if the genesis block was different? Do I have to average over all genesis blocks?

In the first post of this series, we documented the similarities between Byzantine and Nakamoto agreement. However, in order to understand adversary construction, we need to focus on their differences. The main differences are:

  • Byzantine Agreement can only tolerate dishonest behavior from at most 1/3 of nodes, whereas Nakamoto Agreement can tolerate misbehavior from up to 1/2 of nodes.
  • As Byzantine Agreement involves an explicit leader election with a fixed number of participants, it is clear how to construct models for adversaries. In particular, we can assume that some percentage of voters are controlled by or misled by a single adversary. [5]
  • Nakamoto Agreement, in the permissionless setting with an unknown number of participants, makes it harder to define an adversary, since one would have to figure out how to adapt the adversary over time.

Before we talk about how the analysis of Bitcoin and other distributed ledgers gets around this problem, let’s dive a bit deeper into the notion of an adversary, as it is fundamental to understanding blockchain simulations.

In theoretical CS and cryptography, one tries to model adversaries as entities that at a minimum do the following:

  • Replicate known mechanisms for security breaches / selfish behavior
  • Affect the set of actions that an honest participant can take
  • Contain independent random number generators

The entity that is trying to be protected (and is under attack from the adversary) is known as the challenger. As an example, let’s look at how one defines an adversary for a property that is desired in standard public key cryptography. Recall that the plaintext in an encryption system is an allowable unencrypted input value and the ciphertext is the encryped output value. Informally, a public key system with key size λ is said to be resistant to the chosen-ciphertext attack (CCA)[6] if no one can choose a “small” set of plaintexts, query for the associated set of ciphertexts, and use this to ‘invert’ a random ciphertext with “significant probability” [7]. This means that if I have intercepted a number of encrypted texts and the associated public key, there isn’t any non-trivial information leaked about the secret key or the underlying text itself from statistical properties of this sample of encrypted texts. More formally, the adversary’s goal is to take two plaintexts, P⁰ and P¹, and a ciphertext C such that C = Encrypt(P⁰) or C = Encrypt(P¹), and guess whether C is an encrypted version of P⁰ or P¹ with probability significantly greater than 50%.

We next have to construct the game that the adversary and challenger play against each other. In the above description of the CCA attack, there was an informal mention of a “query of an associated set of ciphertexts.” We formalize this a bit more via a simple three-step game (adapted from Craig Gentry’s thesis on Homomorphic Encryption):

  • Query Step: The adversary makes a list of plaintexts of a fixed size
    n =O(Poly(λ)) and requests that the challenger provide ciphertexts for each of these plaintexts
  • Challenge Step: The adversary sends the challenger two plaintexts, P⁰ and P¹. The challenger randomly chooses either P⁰ or P¹, encrypts it as a ciphertext C and sends it back to the adversary
  • Guess Step: The adversary guesses whether the challenger encrypted P⁰ or P¹ and tells the challenger its guess. The adversary wins if it picks the correct plaintext

The adversary can use the information collected from the query step to generate the two plaintexts in the challenge step in a manner that favors the adversary at the last step. The idea here is that if the adversary can win more than 50% of the time, then it is able to use a polynomial number of samples to get non-trivial information about an exponentially-large space (as the number of ciphertexts is exponential in λ). The nice part about framing the problem with adversaries and games is that we have made clear the trade-offs and where the randomness of the system lies. For instance, in the CCA-resistance game, we have randomness in the challenge step and implicitly, we have randomness in the mechanism by which the adversary chooses the two plaintexts to issues as a challenge. However, the example of CCA-resistance is much simpler than a Byzantine system or the permisionless setting of Nakamoto, as we have well-defined adversaries and challengers and their identities are known and unchanging. How do we extend the adversarial framework to these systems?

In blockchain systems, we have a fluctuating number of participants and adversaries. As pointed out in the famous Byzantine Generals Problem, it is often impossible for an honest network participant to decipher if another participant is a malicious adversary or is malfunctioning (e.g. dealing with large network latency or hardware issues). In this setting, we generalize the single adversary, single challenge setting to an environment that manages the state of all users, adversarial or not. The environment precisely encodes how we can run a simulation of our distributed system and is a random variable that represents all possible instantiations of a distributed system. Let’s be a bit more concrete — an environment corresponds to an system that has access to:

  • The state — local copy of the blockchain, network delay — of all participants
  • Access to a probabilistic oracle that represents the universal hash function used in constructing a blockchain
  • Holds all of the randomness needed for the system (e.g. all of the coins that determine the stake/PoW for each participant, any randomness needed for the network). This also includes any static and shared randomness that is common to all participants, such as the choice of Genesis block.
  • Holds the state required by an adversary. In this context, the adversary corresponds to a function that can take control of any participant and change their state — modify their blockchain, send a double-spend transaction, etc. — but can only affect some percentage, 𝛾, of the network’s participants. The environment allows the adversary ‘take control’ of some portion of the network before it evolves the system.

Given all of this state, the environment executes a finite loop that roughly does the following:

  • Allows the adversary to mutate the state and/or actions of up to 𝛾n participants. Participants whose state and/or actions are affected by the adversary are known as dishonest participants
  • Chooses some subset of participants (honest or not) in a potentially non-deterministic manner and have them perform an admissible action — mine a block, make a transaction, execute some number of queries to the universal hash function, or publish/forward a block
  • Updates the state of all parties that are affected by this action. For instance, if the previous step led to a new block mined, then the environment would send the block to the rest of the network, so that they can add this block to their local blockchains [8]

It is natural to be skeptical that such a loop can represent the complicated dependencies that are hidden with a blockchain protocol. One might have questions like:

  1. What happens if two blocks are produced simultaneously and cause a fork?
  2. What if our protocol constructs a directed acyclic graph (like Avalanche) and the relative ordering of two blocks is incorrect relative to a certain topological sort?

However, recall that this loop represents a discrete-event simulation that models the evolution of the environment as new blocks are published or new participants are added. This loop is more or less the same as a loop that samples from a directed graphical model via Markov Chain Monte Carlo [9]. In the graphical model case, we do the following:

  1. Start at an initial state
  2. Sample from the distribution at our current state, decide whether to move to another state, and repeat this step.
  3. Traverse through the graphical model until hit a halting condition and/or we believe that we are close enough to the stationary distribution of the model.

The graphical model can encode very complicated dependencies between seemingly unrelated states, but we can still sample from it via local algorithm that runs in a single loop (MCMC). The dependencies are implicitly encoded in our mechanism for allowing the adversary to mutate state, our methodology for sampling participants, and our block propagation mechanism. It can take many loop iterations for a change to occur, which corresponds to waiting for a dependency to be satisfied. The only difference between graphical models and blockchain environments and simulations is that the dependencies are implicitly encoded in distributions that we sample within a given protocol. This makes proving results similar to those of the previous section similar to proving properties about directed graphical models, albeit with the complication about having to learn implicit dependencies. Once we write out the loop for a given protocol, it becomes easy to analyze the algorithm as a randomized algorithm [10]. Instead of going into the details of how one analyzes a loops and the mechanics of a particular protocol’s loop, let’s instead look at some examples of these loops. Our future posts in this series will analyze the loops for different protocols using environments and simulations.

As a first example, let’s define the environment of the Byzantine Generals Problem:

  • Initial Conditions: The initial YES/NO state for all of the generals, the choice of the first elected leader
  • State: The network graph of the generals, all sent messages
  • Oracle: None
  • Adversary: Deterministic list of generals who are dishonest
  • Loop: Send messages with vote to all neighbors in the network graph until consensus is reached

The Byzantine Generals case is a relatively simple environment, since it deals with a purely deterministic consensus mechanism. Environments increase in complexity as we add in randomness, permissionless behavior, and synchronization primitives. We will first look at Avalanche, as it is a lot simpler than Algorand or the Bitcoin Backbone protocol and has relatively simple environment. The base algorithm that Avalanche is built off, Snowflake, has the following environment loop:

Snowflake is a very simple mechanism for sampling from a biased Pólya urn whose red and blue balls (e.g. the R and B in the above loop) are distributed across a number of participants. The loop does the following:

  1. Samples a single honest user whose color (e.g. a vote YES/NO) is going to be updated.
  2. Passes the chosen user and the set of honest users to the adversary, which can now update the colors of Byzantine users
  3. Has the honest user sample ‘votes’ from k users (who can be either honest or dishonest) and polls their colors
  4. Adjusts its color/vote based on parameters set in the protocol

The analysis of this environment, which Sirer calls a ‘global scheduler’, is unsurprisingly quite simple, as it ends up yielding a standard result from the theory of birth-death queueing processes. We will discuss Snowflake and Avalanche more closely in our next post. On the other hand, the Bitcoin Backbone protocol uses the following more complex environment loop:

This is a much more complicated environment, as there is a large amount of state to deal with and because the environment is performing a proof of work computation at each iteration. At a high level, this loops does the following:

  1. For each user, we choose the longest chain that we have received
  2. We determine what block to use for PoW
  3. We perform the PoW
  4. If we successfully mine the block, we diffuse it to the rest of the network

In this loop, the adversary is hidden in the “pow” function, as the goal of the protocol is to show that an adversary that has t% of the hashing power (for t < 0.5) isn’t able to forge a transaction. In this protocol, the functions R, I, and V more or less control how we sample users who are allowed to mine blocks and implicitly setup the adversary. We will dive into this in the fourth post of this series. Since Algorand is a bit more complicated to convert into this form, we will leave the mapping of Algorand into a single loop for the fifth post of the series. [11]

As you can see, formally setting up the probabilistic simulation of adversarial distributed systems can be complex, nuanced, and have many discordant moving pieces. The choices of axioms that one wants to satisfy affect the environment that one constructs and can dramatically simplify the analysis, if chosen correctly. Researchers are often left with a chicken-and-the-egg problem of figuring out whether to choose axioms first and then construct an environment or vice-versa. As the above environments and simulations show, there are a variety of opinions on what constitutes a “good enough” simulation. Perhaps we will one day be able to combine all of these opinions into a protocol that provides security, flexibility, and performance by taking advantage of all of the benefits of Algorand, Bitcoin Backbone, DFINITY, and Avalanche.

Thanks Abe Stanway and Uthsav Chitra for reviewing drafts of this post.

[0] The title of the Avalanche paper, Snowflake to Avalanche: A Novel Metastable Consensus Protocol Family for Cryptocurrencies, gives a hint as to why there are different qualia between this protocol and Bitcoin. In this protocol, a state is the bit-vector of votes (e.g. yes or no) on a certain block and a metastable state is a state such that a perturbation of even a single vote will cause the algorithm to go in the direction of the changed vote. In Avalache, the metastable states serve as the boundary between where the network uses Byzantine consensus versus where it uses Nakamoto consensus. In statistical physics, these points of equiprobability are called committer surfaces and are very important for finding efficient paths to local minima; see this article of E. Vanden-Eijnden for more information about this.

[1] The simplest example of this is the Vickrey Auction, where all users have a quasilinear/softmax utility function, u(x) = max(v-x, 0), where v is the private valuation that a user has for an auctioned good. In this case, betting for your private valuation is DSIC, since the second place auction price wins.

[2] Nash Equilibria have their own complexity class called PPAD; see the famous Papadimitriou paper for more details.

[3] While it might seem like non-unique Nash Equilibria are a negative for a protocol to have, this situation actually resembles neural networks or non-negative matrix factorization, where the existence of many minima makes it easier for a simpler algorithm to find a ‘good enough’ solution. See this paper from the authors of Ouroboros for more details about non-unique Nash Equilibria in Bitcoin.

[4] In many ways, this is similar in spirit to the choice of set theory axioms — do we use ZF or ZFC? — and one often is left without a concrete resolution.

[5] Distributed ledgers that use Byzantine Agreement, such as Algorand, extend standard Byzantine Agreement to the permissionless setting by randomly drawing a fixed-size set of voters and have this consortium vote on transaction validity. In these cases, one has to prove that both the Byzantine Agreement methodology and the consortium selection mechanism satisfy desired security properties.

[6] Technically, this is the CCA1 attack, in which an adversary has to query for all of the pairs before they attempt to make an attack that takes statistical advantage of the pairs. In the CCA2 world, the adversary is adaptive and can attempt to improve their advantage based on how well the set of pairs that they queried worked. See Chapter 2 of Craig Gentry’s thesis on Homomorphic Encryption for a nice explanation of this.

[7] The notion of ‘significantly greater’ probability corresponds to an increase that is polynomial in the key size — that is, the advantage must be Ω(1/Poly(λ)).

[8] Do note that in the synchronous world — no network latency — the environment communicates the chosen participants actions to the rest of the network immediately (e.g. all users end up performing some action). In the asynchronous world, each iteration of this loops corresponds to a ‘round’ in which no participants and/or actions can be chosen. This represents the idea that ‘nothing happens’ at some time points, which packets are traversing the miner network.

[9] One slight complication: We are usually attempting to perform a discrete-time simulation, but we do not have a bound on the frequency of events. In the synchronous setting, multiple blocks can theoretically be produced at the same round and immediately reach the rest of the network. If we assume that there is minimum time scale (e.g. fastest frequency of the system) that we care to simulate, then we can make the time step related to this size. This is exactly analogous to choosing the time step in molecular dynamics simulations, where one uses the fastest vibrational mode of the system (e.g. hydrogen oscillation time) to determine a time step (usually in femtoseconds). However, this can still be viewed as sampling a directed graphical model, albeit with some continuous time states; for instance, Hamiltonian Monte Carlo on a hierarchical potential resembles this situation.

[10] If you are worried about the correctness of the environment as a simulation at the level of non-deterministic Turing machines with access to random oracles, don’t worry! This paper gives the basic theoretical framework for justifying this setup; see the introduction of the Bitcoin backbone paper for more information.

[11] The Algorand paper doesn’t present their algorithm in a single loop; instead, they give a sequence of coroutines that they prove properties about. In the fifth post, I am going to convert their coroutines into a single loop and discuss how it is similar to the other algorithms that we have discussed in this series.

--

--

Tarun Chitra
Gauntlet

"[Tarun] begrudgingly believes in Occam's Razor" // I write about: Probability, Physics, Hardware, Trading, Crypto, Minimal Techno.