SDBS #17 | The Stealth qPoS Distributed Clock

Stealth
stealthsend
Published in
8 min readApr 13, 2019

The distributed clock used by qPoS is a type of Lamport clock that uses block timestamps to fully order network events

This week, I will explain the distributed clock used by qPoS. I have developed this clock over the last few weeks and have nearly fully integrated it within the qPoS codebase. The distributed clock uses asynchronous timekeeping similar to Lamport clocks, but has some differences. Notably, Lamport clocks do not fully order all events in a distributed system, while the type of blockchain used by qPoS requires full ordering.

To put it more plainly, let’s say that transaction A comes before transaction B, meaning A must be processed before B. Full ordering in qPoS means that the order of processing is known for any two transactions in the entire blockchain.

I developed the qPoS system of timekeeping to address an interesting challenge that I discovered in the private testnet. This network of three nodes has a propensity to fracture, completely isolating single block producing nodes. I discussed this problem in SDBS #16. With this very unstable network, I found it was impossible to maintain network consensus using synchronous timekeeping and began exploring asynchronous approaches.

Stealth qPoS has a unique combination of requirements where timekeeping must be asynchronous but blocks must be regularly spaced relative to “official time”. This presents a sort of paradox that the qPoS distributed clock resolves. From what I can tell, the qPoS solution I am developing uses a distinctive approach for distributed timekeeping, so today’s post will explain this timekeeping in detail. I will start with fundamental concepts and work up to the implementation.

— — — — — — —

Stealth qPoS Ordering and Timekeeping Requirements

Blockchain Ordering

All blockchains require transaction ordering. Simple ledgers like Stealth, Bitcoin, and others require ordering to prevent negative balances. For example, imagine an account called “Alice” that starts with a $0 balance, then executes only the following two actions:

  • Action A: Alice sends $5
  • Action B: Alice receives $10

Clearly, action B needs to come before action A to prevent Alice’s account from becoming negative.

In addition to transaction ordering, blockchains also have block ordering, where the order of processing for any two blocks is known. For example, Block 1 will be processed before Block 2, and Block 2 will be processed before Block 1000, etc.

Figure 1: Blockchain Ordering

Figure 1 shows how blockchains are physically ordered as data structures, and how the order of processing transactions can be determined by the structure. Here blocks are numbered sequentially with arrows pointing back to each block’s immediate predecessor. In this example, Block 1 has three transactions, Block 2 has two transactions, etc. The part of each number after the decimal specifies the ordering of each transaction, both physically and in terms of processing. For example, Transaction 1.2 is the second transaction of Block 1. In a blockchain, Transaction 1.1 is processed before Transaction 1.2, which is processed before Transaction 1.3, and so on.

Block Timing

Most blockchains circumvent the challenges of network time synchronization by allowing significant flexibility in the timing of blocks. This flexibility is generally coupled with (and necessitated by) competitive consensus protocols like PoW or PoS, meaning these protocols have a competition with no set ending time. Candidate block producers work or stake coins until they win the competition, whenever that may be. The result is unevenly spaced blocks with no real time dependability.

Stealth qPoS block production is scheduled rather than competitive. Scheduling allows qPoS to have five second blocks with highly regular spacings, with any deviation typically in the milliseconds. This combination of fast, regular blocks gives a much better user experience. Users can send coins and expect them to be confirmed and spendable within a few seconds, quicker than most credit card transactions. Compare this to Bitcoin where users may wait hours for a transaction to confirm. Reliable intervals means that blocks can be frequent, allowing many more transactions per second, which is key to scalability.

Figure 2: Competitive versus Scheduled Block Production

Figure 2 schematically illustrates the variance of block spacings in competitive and scheduled block production. In competitive block production (Bitcoin) blocks can be fast or slow, with no dependable interval between blocks. Scheduled block production (Stealth qPoS) gives dependable intervals that allows block production at high frequency, a key to scalability.

— — — — — — —

Implementation of the qPoS Distributed Clock

The qPoS Timekeeping Paradox

From the above discussion, it should be evident that Stealth must have a robust and accurate network clock that resolves what appears to be a paradox:

  1. The network clock should be asynchronous, allowing for clock drift between nodes, and allowing them to completely disconnect from the network.
  2. Blocks should be frequent and highly regular.

The Simplest Lamport Distributed Clock

To resolve this paradox, Stealth uses a strategy similar to Lamport clocks. A Lamport clock updates the time in response to network events using a counter to advance the clock. This means that each network event will carry a unique sequential number. For example, suppose a cryptocurrency is using a Lamport clock, and a node gets a block with counter 100. The node immediately wants to broadcast a transaction, creating a transaction with counter 101. Assuming the network is in good communication and latencies are very small, each event (blocks and transactions) will have a unique counter, ordering every event relative to the next.

However, this simple Lamport clock can not work for Stealth qPoS because the Stealth qPoS clock must allow for nodes to be completely disconnected from the network with the potential to reconnect to the network and synchronize with it. How can this be achieved?

Stealth qPoS Utilizes Synchronized Internal Clocks

First, all block producers carry an internal clock with reliable synchronization to the official time (i.e. time provided by dedicated government servers). This synchronization helps with the efficiency of the system, but is not necessarily perfect. Clock drifts of up to one second are tolerable, though they are expected to be much smaller than one second in practice.

Replay Mode Controls Block Production

Each block producer also has two modes, called “in replay” and “not in replay”. Nodes that are in replay do not produce blocks until they exit replay. The condition to exit replay occurs when a node’s staker queue is up to date with its internal clock, which is as close as possible to the official time. Nodes enter replay when the power of the chain on which they produce blocks falls below 51%. Power is a metric for stake participation, which I discussed in SDBS #16. If a network fractures, then some fragments of that network will have less than 51% of the staker power. These are termed “minority fragments” producing blocks on a “minority chain”. Minority fragments will stop advancing their chains. All stakers in that fragment will transition into replay mode waiting for new blocks from the network. This behavior prevents forks. I should mention that in cases of extreme fragmentation, a network could completely stall because no fragment would be a majority fragment. I am still working on such boundary cases.

Rollbacks Allow Error Correction

Nodes that enter replay because of a lack of chain power also rollback to the last block of the previous round of the queue. The reason for the rollback is that the ordering of each round of the queue depends on the content of the last block of the previous round. If a chain is not valid because of a fork, then the blocks will not be valid, and the ordering of the next round will not be valid. By rolling back, the nodes in a minority fragment return to a valid queue, removing any irresolvable consensus issues that arise from the presence of conflicting queues. Rollback is a type of local blockchain reorganization, and one of my main accomplishments this week was to author the rollback code and couple it to the distributed clock.

In very general terms, rollback is a type of error correction that is essential for a fully robust system.

Provisional Blocks Allow Constant Production

Underpinning this whole system is the fact that block producers not in replay mode maintain a synchronized queue that depends only on valid blocks. These block producers have reliable internal clocks and produce blocks according to the queue. Key to this scheduled block production is that nodes will continue to produce provisional blocks even if other block producers are missing their slots in the queue. Block producers can be thought of as machines that just keep going no matter what happens, as long as they are on a majority chain. The provisional nature of the blocks means that any producer will rollback if it finds itself on a minority chain.

Synchronization Is an Emergent Property of the qPoS Distributed Clock

A critical aspect of this protocol is the separation of the two timekeeping apparati, which are (1) the internal clocks of the block production nodes, and (2) the staker queue. Each block, produced asynchronously and provisionally, carries a timestamp from the producer’s internal clock. These blocks are broadcast to the network. As blocks are accepted, the staker queue is advanced according to the block timestamps.

An interesting and desirable consequence of this system is that the staker queue, which relies on asynchronous network events, should have all the properties of a globally synchronous timekeeping device. In other words, the staker queue is a global network clock that keeps reliable time. This property of the staker queue emerges from (1) the synchronous nature of network internal clocks, and (2) error correction that takes the form of rollbacks.

Because the staker queue is a reliable clock, any node that gets disconnected from the majority fragment of the global network can, upon reconnecting, simply utilize the staker queue as a network timekeeping device, using it to reliably synchronize to the majority chain.

— — — — — — —

Hondo

— — — — — — —

Website / Telegram / Slack / Medium / Twitter / Reddit

--

--

Stealth
stealthsend

World’s first private high performance blockchain protocol