Sybil Attack and Byzantine Generals Problem
As I investigated bitcoin more, two different types of attack kept coming up in the related architecture. So, i thought i would go into a little bit more detail on these two. So i will cover two different papers, but i will cover them at a high level and without going into solutions. In the end of each paper, i have tried to highlight how they relate to the Bitcoin network. First one is the Sybil attack paper and the second one is by Lamport et al. on the Byzantine generals problem.
The Sybil Attack
In peer-to-peer system, nodes often replicate data for better security, availability etc. In order to replicate this data, it is important to place this data on unique/distinct nodes, with a majority of them being correct/honest/well-behaving nodes. But a local node cannot know if the remote node is honest or not. In addition, how would a local node know that the same remote node is not presenting multiple identities — to make the local node think that it is actually communicating with multiple remote nodes — hence the name, perhaps coming from the book Sybil.
The central question that this paper tries to ask is: In the absence of a central authority, can a correct node(c) establish uniqueness of identities presented by another remote node. This remote node can be an correct/honest node(c) or a faulty node(f) i.e. For the correct functioning of the system, it is not desirable for a remote node to be able to present multiple identities.
Traditionally, systems have relied on central authorities such as ICANN, IANA for IP address or Verisign for certificates or Keys embedded in the device hardware. These central authorities can ensure and vouch for uniqueness of identities. But how can this be established in a distributed system or a trustless environment?
This paper uses a fairly high level system model for setting up the environment of attack. Basically there are a bunch of nodes/entities, pipes to send messages to the cloud and cloud network that can carry messages to all the machines reliably. Here is the diagram from the paper:
Some nodes can be faulty(f), some will be correct/honest(c). Restating the problem: Can the local entity reliably verify the uniqueness of identities provided by remote entities? How can local entity(l) establish that identity i1, i2, i3 was not presented by a faulty node?
Key idea of using resource usage for Direct Validation
In an absence of a central authority, it seems like there are two ways to verify identities. One is by direct validation i.e. l has verified some remote entity before and know about it. Second is indirect validation: l verified some entities and they have verified some other remote entities. So those remote entities can be trusted.
So lets start with direct validation. How can l verify uniqueness of remote entities? The neat idea that this paper presents is: Local entity can give out a large enough work to all entities simultaneously and make sure that answers come back within a given time. Given that work consumes enough resources of each entity, a single entity cannot complete more than one piece of work in the given time. This can be done any of resource such as CPU, network or disk.
- So if CPU is a constrained resource, then send a puzzle — similar to proof of work, to all remote entities e.g. Given y, find x and z such that n least significant bits of ( hash(x || y || z) ) are all zero.
- For network, local entity can brodcast a message and expect a response in a certain time.
- For disk, l can send some incompressible and large message for storage and check for excerpts sometime later.
Since the solution relies on resource usage validation, there can be f nodes with enough resources that can present more than one identity successfully to l. This really boils down to the question of resource parity. In heterogeneous systems, this is a likely scenario. If f has G times more resources than l, it is trivial to assume at least presentation of G identities by f.
Another key item that was highlighted in the last section was “simultaneous verification”. If the local entity cannot verify concurrently answers of all the puzzles that were sent out, then the a single remote entity can just solve the same puzzle one after another and send out the responses. This again seems like something that is hard to achieve in practical distributed systems.
One of the services that a local node can delegate to remote nodes is validation of identities itself. So once a local node has established that the identity of remote node i is valid, it can then accept identity i1, i2 etc. vouched for by i. Obvious issue is that a bunch of faulty entities can collude and vouch for faulty identities. So let’s say a local entity l accepts identities vouched for by q other accepted entities, then faulty identities with more resources than q can collude and get more identities accepted. Also another impracticality is that “simultaneous verification” is needed — so all the a entities need to present verify challenges concurrently.
Bitcoin and Sybil Attack
I had read a lot of literature on authenticity, integrity of identity, i had rarely thought in deep detail about uniqueness of identities and their implications in a trustless distributed environment. This paper was an eye opener in that regard. It also clearly illustrates impracticality of establishing unique identities in the absence of a central authority. We rely on central authorities for IP address like unique identities because that seems like the most practical way to achieve unique identity. A brief explainer of bitcoin and sybil attack is here. Looks like bitcoin network relies on IP address based identity and then allows only one outbound connection to /16 subnet for outbound connections and thus reducing the probability of an honest node from connecting to peers that are completely dishonest all the time.
Byzantine Generals — Consensus in Trustless environments
Lamport’s fascination with story telling continues and i like his approach a lot. This makes for a very interesting reading indeed. Here is Lamport’s explanation on storytelling based approach and assessment of Dining-Philosophers vs Reader-Writer.
In the previous paper, i discussed the need to establish trust in a distributed environment. The example environment in that case was a set of servers controlled by a centralized bank — traditionally that is considered as an environment that is NOT hostile. But what if there was a need to establish consensus among a set of nodes that cannot be trusted? This is very relevant in the world of bitcoin, where a lots of nodes may not always cooperate and a lot of times may collude together to alter the behavior of the network for the benefit of the few.
Lamport sets this up using a war situation, where you have m generals of an army, each general controlling its own force. Generals are camped outside and need to reach consensus on the decision of attack/retreat. Since it is wartime, there might be spies around and some generals themselves may be traitors. So then the problem changes to establish consensus among loyal generals — as long as there are enough loyal generals. Since the environment is hostile, traitor generals will do what they want anyway and try to thwart the loyal general from reaching consensus. So two important constraints need to be followed by the algorithm:
- IC1: All loyal generals must agree upon a common decision i.e. Attack/Retreat
- IC2: If a loyal general asks another general to “attack”, then that loyal general obeys that order. Same for “retreat” command from a loyal general to another loyal general.
To solve this problem lets assume some terminology. Some of the terms will only be used later, but just go over it for now:
- Lets call the general sending the message a “Byzantine Commander” (BC)
- Lets call the receiving generals as “Byzantine Lieutenants” (BL)
- There is a proof by contradiction coming later — So lets use Albanian Commanders(AC) and Albanian Lieutenants(AL)
- Just to emphasize: these are different types of generals that need to reach consensus
The curious case of 3 generals
I will let the diagrams do the talking here. In case of three generals, it looks like there is no way to reach the consensus and satisfy IC1 and IC2 mentioned above if there is 1 traitor. The traitor can be the commander(BC) or the lieutenant(BL1/BL2)
In figure 1, BL1 must obey the “attack” command, even though BL2 is sending a confusing message of “retreat”. In figure 2, BL1 will obey the command of “attack” and BL2 will “retreat” — thus violating IC1. Because of this, BL1 doesn’t really know who to trust i.e. BC or BL2. That’s why in the Figure 1, it wouldn’t know what is the right order and can’t just blindly trust BC. Thus in both cases, BL1 and BL2 are a bit in the dark about who to trust. Hence it is NOT possible to reach consensus in case of 3 generals if there is at least 1 traitor. Apparently this generalizes well to: If there are m + 1 generals then at most there can be m/3 traitors in the group. Otherwise the consensus cannot be trusted.
The proof can be reached by using contradiction. The approach is to assume that solution exists for 3m Albanian Generals where there are at least m traitors. Use of Albanian generals is just for easier terminology. Then we can reduce that assumed solution to a case of 3 Byzantine Generals and reaching consensus in the presence of 1 traitor. But we just saw in the last section that it is not possible. So we have reached a contradiction.
Looking at the figure, what we are saying is that we have 3m Albanian generals. There are divided into three groups of approximately m generals each. We can assume the top group to have one general assume the role of both AC and BC. Other m-1 Albanian generals perform the role of AL. Two other groups consisting of m Albanian generals, each general performs the role of AL and we can say that there is a BL that simulates the roles of all these m ALs. Same message is sent to all the entities within the group as illustrated by arrows and the lines.
With our assumption of an existing solution that works in this scenario in the presence of ‘m’ traitors, we can assume the BC/AC group is the traitor group. But IC1 and IC2 are still obeyed because of assumed solution. Effectively this means a message that is being sent by BC to the two groups of BL, is obeying both IC1 and IC2 constraints. This we know not to be true and hence the assumed solution of Albanian generals doesn’t exist.
Bitcoin and Byzantine generals
This paper lands on an important result that shows that in case of misbehaving m nodes, at least 3m + 1 level of good nodes are needed to conclusively ensure the correct functioning of the system or to land a conclusive consensus. How does this affect the bitcoin network?
As we have seen in the past paper, there can be a number of nodes that can collude to double spend and form their own version of blockchain that becomes the final version used by the network. The best explanation i have read on this is from Satoshi and is here. The high level idea is that rely on the essence of blockchain: the hard proof-of-work and each new PoW extending the previous one. Since a block gets generated every 10 mins, after 2 hours, 12 blocks will be generated and the longest block will ensure that most honest nodes have expended the most effort and landed on this common linked chain i.e. consensus. Even after reading this explanation, it wasn’t too clear to me if majority was sufficient or if 2/3rd of the network needed to be honest, to maintain a consistent ledger. I have seen some other papers refer to these scenarios which i will dig into in some later posts.