SDBS #16 | Power: A Metric for Fork Participation

Stealth
stealthsend
Published in
8 min readMar 29, 2019

The Power metric is used by Stakers to decide whether they have joined a minority chain in a network fork

I spent this week debugging a small network of three nodes and three stakers with the challenge of keeping all nodes synchronized to the same chain. One improvement I developed in this process is a metric I call “power”. A chain must represent a majority of total network power in order to grow. Power prevents runaway chains that can lead to network fragmentation.

— — — — — — —

A Tough Network: Three Nodes and Three Stakers

My testnet currently consists of three nodes and three stakers. Although this may seem like a vastly simple network, it’s small size lends itself to frequent fragmentation such that a single node can easily be isolated from the other two. This fragmentation occurs as a result of a random churning of connections that is designed to improve connectivity of large networks of dozens, hundreds, or thousands of nodes.

To understand why, imagine that a network of nine nodes is “seeded” by a single node. Nodes connect the seed node and each new node creates one connection in the network. If connections are static, the topology of this network resembles the left panel of Figure 1.

Figure 1: The starting topology (left) and a possible ending topology (right) of a completely centralized network that undergoes reorganization by randomly reconnecting nodes. The single seed node is yellow, and other nodes are light blue. Connections are represented as lines.

Notice that the starting topology has a single point of complete failure. Take out the yellow, central node, and no node will be left with a connection.

Now, imagine that connections are periodically and randomly dropped and created between existing nodes, keeping the total number of connections (lines in the above figure) constant. After this reorganization, removing the yellow node can at worst split the network into two parts, making the reorganized network topology on the right much less fragile than the perfectly centralized topology on the left.

Of course, creating more total connections between nodes will result in a much more robust network, such that removing one or many nodes will not fragment the network. But the example in Figure 1 is kept simple, meaning the example network is simple, and consequently more fragile.

Unsurprisingly, the testnet of three nodes I currently run is highly fragile. Figure 2 shows how a node can easily be isolated, even if the network starts fully connected.

Figure 2: A simple network of 3 nodes (A, B, and C) easily fragments as a result of random reorganization. Network fragments are designated by their components. This example shows a network that splits into fragments “AC” and “B”.

In Figure 2, three nodes (A, B and C) begin fully connected, where “fully” means that every node is connected to every other node. In this example, B randomly drops its connection to A, with the possibility of randomly adding A back in the future. Before the A-B connection has a chance to be made again, C drops B, completely isolating B.

In this resulting network, where C and A are connected but B is isolated, the network fragment AC will produce blocks independently of B, creating a blockchain fork. Such forks are common with cryptocurrency networks, and practically all cryptocurrency consensus protocols have methods for dealing with these forks. In fact, the development of reliable protocols for dealing with forks is the hallmark of cryptocurrency technology.

Bitcoin, the first cryptocurrency, addressed such forks in two ways: (1) by having a true peer-to-peer network that features real-time communication between nodes, and (2) by the development of proof-of-work, which is one way to resolve conflicts in consensus. My goal is to develop a protocol for dealing with forks within the scheduled block production system used by qPoS.

The random drops represented in Figure 2 are typical of the network I have been running, creating difficult synchronization problems on the testnet. One way to address this fragility would be to force all nodes of the testnet to stay connected, simply by removing random drops from the peer-to-peer protocol.

Although it is tempting to force full network connectivity, I decided that if I could find some way to keep all three nodes synchronized, even as single nodes are frequently completely isolated from the rest of the network, I will have developed a very robust protocol for dealing with forks.

Last week, I discussed how I moved to an asynchronous network clock that relies on network events (namely block production). Briefly, in this system when a block producing node gets isolated from one or more other stakers, it would produce provisional blocks, even adding these provisional blocks to its own chain. The isolated node would continue to produce provisional blocks until another chain with higher trust is relayed by the network. If a chain with higher trust does come along, the isolated node would stop producing provisional blocks on its current chain, roll back to a block common to the chain with higher trust, then begin to produce blocks on this best chain.

One problem with this design is that, if left unchecked, the production of provisional blocks would quickly result in a chain that creates true consensus issues between nodes. The most severe problem is that the queues would become completely unsynchronized, and the two network fragments would ban each other, leading to a “permanent fork”. Figure 3 shows an example of a fork in this system.

Figure 3: A network of three nodes splits into Fork AC and Fork B. Queue slots that have blocks produced within them are represented as squares with solid borders, and skipped slots in the queue are represented by squares with dashed borders. Block numbers are indicated in the lower right of each square. The staker (node) assigned to the slot in the queue is indicated in the upper left of each square as a letter. Leftward arrows point to a block’s preceding block. Blocks are added to these chains from left to right, meaning newer blocks have higher numbers and are further right on the diagram. One fork (Fork AC) is on the top of the diagram, branched from block 6, and the other fork (Fork B) is on the bottom of the diagram, also branched from block 6. Rounds of the queue are inscribed by blue rectangles. Two rounds are represented, Round 2 and Round 3. Notice that the queue for Round 3 is different between Fork AC and Fork B.

Imagine what would happen if the two network fragments (AC and B) described in Figure 2 each continued to produce blocks, starting after block 6. As represented in Figure 3, Stakers A and C on Fork AC (top) add blocks in Round 2 of the queue. First, Staker C adds block 7, then Staker A adds block 8, then B is randomly assigned two consecutive slots, which it misses (grey squares). Then Staker C adds block 9, then Staker A adds block 10.

On Fork B (bottom), Staker B is the only block producer. From Staker B’s perspective (because it is isolated), Stakers A and C miss their blocks. So Staker B adds blocks 7 and 8, skipping slots belonging to Stakers A and C in the queue.

Figure 3 shows the consensus issue if provisional block production is unchecked on minority chains, where “minority” means only a minority of the total network stake is on the chain. The problem is that the ordering of Round 3 is different between both chains, meaning that when one network fragment discovers the other fragment’s fork, it will think that the stakers in the other fragment’s fork are producing blocks outside their assigned slots. Such “unscheduled” production is highly disruptive to the network and will lead to nodes’ banning each other, thinking that the unscheduled production represents the purposeful creation of forks.

With the example in Figure 3 in mind, it is clear that nodes that produce provisional blocks should stop before they create a new round of the queue, which would potentially lead to unresolvable consensus issues. But how will a node know that it should not make a new round?

What is needed for a node to answer this question is a metric that quantifies whether the node is producing on a chain that will never have a chance of being accepted by the full network. For qPoS, I call this metric “power”. Power is calculated by adding all the staker weight that has produced blocks in the previous and current rounds of the queue, up to and including the present block, then dividing that number by total staker weight possible for the slots spanned by these blocks. Figure 3 shows the results of this calculation for the two branches of a fork, assuming all stakers have the same weight of 1.

Upon the production of block 9 (which we can call “Block AC9”) in the top Fork AC, the power is 0.6. The current round is Round 3, which has had two slots, only the second of which resulted in a block (AC9), and the previous round is Round 2, where two of three slots resulted in blocks. Since there were three (3) blocks and five (5) slots, the power at Block AC9 is ⅗, or 0.6. Clearly this is a majority chain.

In the bottom fork, Fork B, the power at block 8, which we can call “Block B8” is 0.4. Here, the first two slots of Round 2, belonging to Stakers A and C, were missed and only the third slot produced a block, which we can “Block B7”. In Round 3, up to Block B8, Staker C missed its block and Staker B produced its block. For these two rounds we have five (5) total slots, with only two (2) blocks being produced, so the weight upon the production of Block B8 is ⅖, or 0.4.

With a weight of 0.4, which is less than 51%, it is clear that Staker B is producing on a minority chain and should not make any new rounds of the queue. In fact it should roll back to the end of Round 2, stop producing any blocks, and wait until it connects to the larger network, then add blocks onto the majority chain once this chain is discovered.

— — — — — — —

Hondo

— — — — — — —

Website / Telegram / Slack / Medium / Twitter / Reddit

--

--

Stealth
stealthsend

World’s first private high performance blockchain protocol