Consistency validation of the 0chain blockchain

Siva Dirisala
Zus Network
Published in
5 min readAug 29, 2018

--

Building a high quality enterprise application is not simple. If the application happens to be distributed across multiple servers it is even more challenging. Add byzantine conditions and reaching consensus to the mix, it starts to require a rocket QA scientist.

0chain is being built ground up with emphasis on enterprise use cases. Hence, it is important for us to make sure it meets the enterprise customer expectations in terms of functional quality, reliability and scalability.

In traditional enterprise applications, the QA person typically writes a repeatable functional scenario as a test case and each time executes the transaction and verifies the end result. For a blockchain that would be akin to submitting a transaction, waiting for confirmation and ensuring that the block is further extended. This is the bare minimum requirement.

In blockchain, one of the key requirements is that the block is extending from the previous block. This is very easy if there is only one block that gets generated in a given round. But what if there are many blocks generated in a given round? This is where consensus algorithms come into picture. They guide the system to agree on a single valid block. While the algorithm provides guidance, there is no reason for a specific miner node to follow that. This is because of byzantine conditions in a decentralized block chain. In addition, temporary network latencies can also make nodes to view only a subset of blocks generated within a given time and hence make decisions based on the partial view of the world.

So, if there are multiple blocks getting generated and each miner could be seeing a subset of them deliberately or otherwise, would there be several parallel chains building up? Again, the consensus algorithms provide guidance on how to pick the blocks and under certain threshold byzantine players, guarantees that the blocks have a deterministic finality.

Here is a list of things that could go wrong to have a blockchain work perfectly from the get go

  1. Protocol design still in progress
  2. There are code bugs (it’s a reality till the code is hardened for several days)
  3. The nodes are missing some blocks rarely due to network issues.
  4. Perhaps a crash of the miner who needs to rejoin. At this point, till the sync of the required data is complete, the node is always looking at a partial view of the blockchain that got created before joining.

Building a complex blockchain technology is like assembling a large jigsaw puzzle. You make assumptions about the position and orientation of the pieces and put them in approximate locations on the board. As you start adding more pieces, you realize the mistakes and adjust them. Usually the edge and corner pieces provide the initial guidance from which even more pieces get to the right place. It’s the same with building blockchain software. There are several protocols — consensus, economic, governance, miner selection, punishment for bad behavior and so on. They are all moving to eventually fit together like that perfectly completed jigsaw puzzle.

One of our team members came up with an idea of putting the various consensus artifacts such as the block messages and verification messages into a graph db and visualize the chain to ensure everything is good. While this worked, we had a few challenges. One is continuously extracting the data and putting into a graph db. The second is the performance of visualization on the client where it tries to show thousands of nodes and their relationships. That’s when I realized we needed something much simple, scalable, performant and fast to validate.

Here is the key question, the problem statement of this blog post, with an elegant solution that we designed at 0chain. When you write lots of moving parts that are not 100% accurate (deliberately, to make progress in such highly dynamic system), how do you ensure that the algorithms are all running fine under 100% happy path conditions? That too when you have a sub-second finality and you can’t pause the system to look at things and then resume, how do you quickly verify the sanity of the blockchain within seconds at a single glance? And do this in a way that doesn’t add too much overhead (cpu or memory) to the rest of the system.

Meet our “powers of 10” validation system. We take a snapshot of a few key data points like the block hash and chain weight every time a block is finalized and keep track of these by the powers of 10. That is, say the finalized block round is 1248, then we just take the snapshot of 1248 and store it. Then we get to 1250. We take the snapshot of 1250 for the units place and also again for the 10s place. So you have two data points for the same block. When you eventually get to 2000, you save 4 data points for the 4 place values, units, 10s, 100s and 1000s. This is what we mean by the powers of 10 validation system. For each place value (except the 1s), we store the 10 last values. That means, if the current round is 12389, we would have 12389, 12380, 12370,12360,12350 … 123290 and then 12300, 12200, 12100 and so on. With this scheme, even if the blockchain is continuously running for a billion rounds it only stores less than 100 data points.

The best part is that the memory to store the key information is reused and so there is no memory overhead. And you only take at most n snapshots at every multiple of 10^n value, hence low CPU usage.

The resulting data can be displayed in a table and by looking at this page for any two different mining network nodes will quickly tell if there is any discrepancy.

Powers of 10 validation report

As you can see above, the current round is 22045 and the last 10 values of the powers of 10 for the place values of 10, 100, 1000 and 10000 are shown. The best part is human eye is so good that a quick glance of two such reports successively will immediately spot even a small difference. If there is a difference, you will also be able to establish a conservative upper bound of up to what point there is agreement and when they diverged. I say conservative because, you are going to be taking the floor value of the nearest power of 10 up to which the reports agree. This report has been very useful in ensuring that our blockchain code is evolving with a good quality in spite of all the complexity.

Power of 10 validation report from two mining nodes

Here is links to the report against two of the miners, M2 and M3. Click these links and open each image into a separate tab and then quickly switch your tabs and view. You will notice that except the top most row everything else remains the same (including timestamps with seconds resolution). Our eyes can detect this very fast.

--

--