“Objective and unambiguous 100% finality is a critical property for all blockchains that wish to support inter-blockchain communication. Absent 100% finality, a reversion on one chain could have irreconcilable ripple effects across all interconnected chains.” — Daniel Larimer
This will be a two part series in the finality and scalability in interconnected networks.
Finality, the irreversibility in the transferring of ownership, is something that many of us don’t think about in our day-to-day dealings. We trust that when we see our paycheck arrive in our bank account that the money is ours to spend, that when we swipe our card buying groceries that once it says “Approved” we can walk out of the door with what we bought and when we see that bitcoin show up in our wallet, that it is officially ours, but behind all of this is a carefully considered judgment on the probability of the transaction that just occurred being reverted. As is the case with most imperceptible things that are part of the basic fabric of our life, it is essential to the normal operation of every transaction based system.
Finality is never an absolute. Mistakes can be made, banks can burn down, glitches happen and there are bugs in code. Something can always happen with some degree of probability that can take away what was previously in your possession. In centralized systems of trust, like banks, finality is up to the bank to make sure you have the funds and the permissions in order to transact and no one but them can decide on a valid history. In blockchains however, just because a transfer was made, doesn’t mean the network won’t decide to switch to a different valid chain a moment later which wipes that transaction from history.
Why is finality important in interoperable networks
If a network is looking to be a network of interoperating chains, where contracts and actions taken inside that network will be dependent on the data and transactions in multiple chains, finality becomes a crucial variable. Transferring anything of value based on information from a secondary chain without the ability to guarantee that the information on that chain will not be reverted is a recipe for an abandoned network.*
As laid out in this post, INT is looking to find the holy grail of interoperability by creating an IoT focused blockchain network based on easy scaling and seamless interoperability.
As stated in INT’s white paper, and in greater detail here, to achieve this, INT has built a heterogeneous multi-chain framework that operates subchains running in parallel to a main “Thearchy” relay chain. This Thearchy chain will “oversee” the running of many parallel subchains by operating as the state transition machine of the network, driving consensus and generating blocks for each subchain while leaving the subchains to only validate transactions. This facilitates higher transaction throughput in the subchains and enables complete interoperability between subchains as the information within all subchains is centrally located. Supernodes within the Thearchy will serve as the masternodes of the subchains and be the intermediaries of cross-chain communication between subchains.
This separation of state transition machine requires two levels of consensus working in concert to achieve network wide state agreement. With the Thearchy chain being the relay passing information from subchain to subchain, finality of transactions must be assured. This drove the decision to make the Thearchy chain DBFT for main consensus and the subchains DPoS for transaction validation.
As we will see, DBFT enables fast consensus and complete finality as well as speed, scalability and interoperability, which is needed to support their chosen application.
Finality in blockchains
Because of the lack of centralized authority, blockchains rely on distributed agreement on what is a valid history. Because this agreement takes a finite amount of time and is not guaranteed, it is possible that transactions become part of a block or chain that does not become the future accepted chain. This is why transactions normally require a number of “confirmations” from the network before you can use what you received. With each additional block, and therefore, confirmation, the transaction is further verified and the likelihood of that transaction being reversed is reduced.
But why does it matter? Why not just figure out the optimal number of blocks before the chance of it being reverted is significantly small and set-it-and-forget-it? Well this certainly works in asset transactions like Bitcoin, Ethereum and any other cryptocurrency and is what exchanges do when they receive your deposit. They wait till that transaction gets to a certain “depth” in the blockchain before allowing you to trade it.
But what about in a system that passes data of varying worth or significance, or a network that interoperates with another network? In an IoT network, having your weather sensor send a reading of 70° F and it not being available for use for 10 minutes until it gets “fully” confirmed doesn’t make sense. While on the other hand having a complex contract rely on an asset transfer on a chain that gets reverted after that contract was already verified could have large scale consequence. You can see that depending on the situation, finality becomes much more complex and in a network of interoperating chains, full reliance on the finality of a transaction on a chain is essential to operation.
Why the choice of consensus mechanism is important
The definition of settlement finality within a blockchain network is wholly dependent on the consensus mechanism, block generation time and node communication. These three variables determine points of failure and time to finality of a certain confidence level.
In PoW networks there is never 100% finality. As the transaction gets deeper in the blockchain, the probability of it not being considered a valid chain decreases but is never eliminated. There is always the risk that miners may switch to a chain that starts from a block that was before the block which your transaction was a part of and doesn’t include that block.
The PoW consensus mechanism uses the longest chain, which therefore means the chain with the most work done on it and consequently, the most miners, as the accepted main chain in the network. It is therefore possible for there to be many alternate chains at a given moment, each proposing a new timeline of events. If one of these forks continues to have miners working on it by result of a bug in the system or malicious intent, 2 equally valid histories begin to form. This happened in 2013 in the Bitcoin network when there was a bug in one version of the software but not the other that essentially created two competing chains which went on for 6 hours after which the problem resolved itself as one chain conclusively became longer than the other, thereby nullifying the alternate chain and erasing the history of those 6 hours of transactions.
So how do we determine when we can trust the finality of a transaction in a PoW network? We can define the problem by imagining someone wanted to do this on purpose in the form of a double-spend attack and work through this mathematically. Lets assume they have no more than 25% of the network hashpower at their disposal and we want to be more than 99% sure that the transaction will not be reverted. In this case there is only a 25% chance that this alternate chain will add a block before the main chain does and a 75% chance it will not and therefore lose ground in the race to be a longer chain. This works out mathematically to:
Where x is the number of “steps” or blocks that the alternate chain has to make up in order to be longer than the main chain and F is the probability that it will succeed. Finality, f, is therefore defined as 1-F.
For 1 block behind, the result is 33%. Meaning that if we only waited one block or confirmation, and there was a malicious actor with 25% network hashpower, they could revert a transaction with 33% success over one block.
With each step, or block we wait, the probability of the alternate chain taking over goes down exponentially.
For 2 blocks, it’s 11%; for 3 blocks, it’s 4% and so on. And after 6 blocks it is .1%.
This is why the general accepted finality for a PoW network is 6 blocks or confirmations. At that point, the probability of your transaction not being reverted by a chain which has it’s origin before that block is about 99.8%.
And if you would like a higher confidence in that transaction finality, waiting 10 blocks is 99.998% and 15 blocks is 99.999993%. This also shows that the probability never goes to 0, so 100% finality is impossible.
And this is why centralized control of 51% of the network hashpower is a concern. A concerted effort to create a mass double spend attack becomes a real possibility with high probability of success if the actor has 51% of the network hashpower.
So how long do we have to wait to get this 99% finality? This is where the block generation time becomes important in the time to finality. In Bitcoin, with it’s 10 minute block time, 6 confirmations is 60 minutes. In Ethereum, with it’s 10 second block time, 6 confirmations is 1 minute.
In PoS networks, finality is not defined using the longest chain but by the number of validators signing a block. If you are not familiar with PoS, validators serve as the miners in the network with their mining “power” or influence being proportional to the amount of coins they own. If you are a validator and own 2% of the total amount of coins in the network, you would validate 2% of the transactions or 2% of the blocks in the network.
The problem with PoS validators is that it doesn’t inherently protect against alternate chains like PoW mechanisms with their longest chain rule. There is nothing to protect against a portion of the validators voting for one block and then voting for another contentious block as well, creating two chains. This is called the Byzantine Generals’ Problem and every PoS system must protect against it in some capacity, even in trusted systems.
In the Byzantine Generals’ Problem, the actions taken by an army, whether to attack (validate a given block) or retreat (not validate a given block), is made with all of the generals (the validators) of that army coming to a common decision. In order to ensure those generals are acting in the best interest of the army they support, a system has to be formed to make sure malicious actors (the dishonest validators in this case) are nullified or punished for their actions. Solutions to the Byzantine Generals’ Problem are called Byzantine Fault Tolerance.
In the simplest case, tolerance to this kind of attack can be achieved if the honest validators have the majority agreement and any dissenting vote is not considered. While this will stop any split in the network, it does not dissuade the validators from being dishonest (the Nothing-At-Stake Problem). In many PoS networks this is done by punishing those that vote against a valid block and the majority.
The details of how many validators are needed for consensus and the penalties for dishonest validators are dependent on the specific implementation. In general, the standard for Byzantine Fault purposes is 2/3 of the network validators in agreement for total consensus.† This maximizes the amount of validators needed in order to attack the network, thereby preventing network split and validator collusion, while minimizing the amount of validators needed for majority decision.
Penalizing dishonest validators is a more nuanced argument which leads some networks forfeiting your validator reward for a given time, revoking of your validator privileges like EOS or burning the entirety of your staked coin holdings like the proposed Ethereum Casper PoS Implementation (as Vlad Zamfir put it, imagine a version of proof of work where if you participate in a 51% attack your mining hardware burns down). The point here is to make it economically wise for validators to be honest and act in their own best interest.
In addition to economic finality, where the cost of colluding for a chain reorganization assures finality, there is also subjective finality where the software itself decides a window of changeability. The clients decide to not overturn finality after a predetermined block depth, thereby making it an iron clad finality no matter what a collection of validators propose to the contrary.
Okay, getting back to finality.
With all that said, in PoS networks, time to finality, f, is defined as:
where n is the number of validator nodes in the network needed for consensus and ω is the protocol overhead (the number of messages per second that validators need to process). This is very similar to the classical physics problem of speed = distance/time where if you want to process n messages (distance) in f time, then the number of messages per second (speed) is ω = n/f. You can see with this three variable problem that you can only chose the best scenario for one variable or have a mix of less than optimal values. Putting this into diagram form:
So you can only be king in one domain. Do you want high decentralization and low overhead? The finality time will be very long. Do you want high decentralization and low finality time? The nodes will have high demands on performance. Do you want low overhead and low finality time? You will have to limit the network to a small number of nodes. Of course there is an infinite variety of in-betweens that yield a spectrum of compromises.
The number of messages per second, ω, is dependent on the mechanism chosen. In PBFT or DBFT every node must send their response to block proposal whereas in chain based PoS or DPoS the network selects one validator to validate the block.
For BFT mechanisms, since every node must send a message signing the block before the next block can be made, the number of messages per block is one to (block proposal) and one from (block signature) each node per block time, B, or: ω = 2n/B. For PoS mechanisms, since there is only one validator per block, there is only one message per block time so ω simplifies down to 1/B.
Plugging in the numbers, lets assume we have a PoS network with a 5 second block time (fast-ish) so ω = 1/5, and 10,000 validators (highly decentralized). That gives us a finality of 10,000/(1/5) or 10,000 * 5 or 50,000 seconds (~14 hours). If we want to have a network of equally fast block time but a quick finality time we have to sacrifice decentralization to limit the network validators. With 30 validators, this would give us a finality of 30 * 5 = 150 seconds (2.5 minutes). Of course you can take a middle ground and have 1000 validators, require those validators to communicate more at 1 message per second which would yield a finality of 1000 seconds (~17 minutes).
Because BFT mechanism requires a response from at least 2/3 of the nodes in the network for consensus, they achieve finality much quicker for a given network setup at the expense of overhead proportional to the amount of validators in the network. Taking the examples above, for a DBFT network with 5 second block time and 10,000 validators, the overhead ω = 20,000/5 or 4,000 messages PER SECOND which gives us a finality of 10,000/4,000 or 2.5 seconds, versus the 50,000 seconds in the equivalent DPoS setup.
INT has taken the route of limiting decentralization for favor of fast blocks and fast finality with 31 validators and a block time of 10 seconds. This will require the validators to maintain high overhead in order to reach consensus in time.
To summarize the results of some PoS projects out there, EOS carefully selected a small number of validators (block producers) to support their DPoS BFT based network, Neo has an even smaller number of validators (book keepers) that are more centrally controlled (currently) in their DBFT based network and INT is kind of a mix of both with their carefully selected (in work) small number of validators (meta nodes) to support their DBFT based Thearchy chain while the subchains will have another set of validators (supernodes, also in work) to support their DPoS based subchains. Finality within INT will be driven by the DBFT Thearchy chain.
Finality in IoT
IoT will be a network of varying uses, from small data sets in high frequency to larger chunks of analyzable data groups to value transfer. It therefore does not make sense to use a one size fits all approach on finality. Many people are caught up in the idea of blockchain based IoT networks not working because their impression of blockchain transaction processing time are long and DAG based coins are faster and therefore the “future”. The important thing to remember is being included in a block and receiving transactions are two different things. The reason Bitcoin or any other blockchain transaction takes so long is because you cannot use the coins transferred until there is a significant probability that they won’t be reverted. Transactions within the network are just as fast, if not faster, than IOTA or Nano as they only rely on the speed of light latency through the network (1–3 seconds on average). Blockchain based IoT networks like INT will use base latency for most low value data transactions throughout the network, making it’s most basic uses as fast as anything out there. Full finality will then only be used for transactions of greater value like asset exchange and even then, full finality in INT will be ~5 seconds.
Notes and References:
*To further emphasize the importance, imagine a scenario where a smart contract is in place that transfers the ownership of real estate on one blockchain upon the successful transfer of a cryptocurrency. This transfer takes place and the ownership of the real estate is transferred only to have the payment be reverted on the payment chain. This type of event on any scale and of any value is unacceptable.
†2/3 is standard for byzantine-fault-tolerance purposes. If you require 3/4, then 26% can collude to prevent finality, and if you require 51% then with 2% byzantine actors plus a network split you can create a scenario where one half of the network finalizes A and the other half finalizes B. A 2/3 threshold ensures that both of these attacks require 1/3 byzantine to pull off, which has been mathematically proven to be as safe as you can get. — Vitalik Buterin
More on Byzantine Fault Tolerance: http://pmg.csail.mit.edu/papers/osdi99.pdf