Deploying blockchain at global scale

Siva Dirisala
Zus Network
Published in
4 min readSep 14, 2018

At 0chain we emphasize on fast finality. Fast finality requires important architectural design choices in various components of the blockchain and below are some of the key areas.

  1. Design of the protocols
  2. CPU and Storage workload (read/write) optimization (storage capacity optimization is less important as long as the order of magnitude is not very high)
  3. Network level optimizations

Per my previous post on our devops process, the development team runs a mini blockchain on their laptops. When running the blockchain on a local machine, only the first two listed above can be experimented and tested since there is barely any network latency on a local machine. Of course, those two itself are complex problems to deal with and our focus on them paid off as can be witnessed with our testnet launch with sub-second finality with only two data centers (each zone with 3 miners and a sharder).

During our data center deployment of the testnet we uncovered multiple issues for performance which have been systematically addressed. For example, the block generation time and verification time are true bottlenecks apart from the network latencies. During verification, since a lot of signature verification happens and these crypto functions are CPU intensive, we ensured we could run them in parallel with a configurable concurrency based on the cores available to the node.

As we got the CPU and storage workload under control, we decided to shift our focus to the network related issues. Our upcoming testnet will be truly global by spanning 9 data centers in US, Europe, Asia and Australia.

In distributed consensus, the generated blocks need to be sent out to everyone and get their approval. This process of sending and receiving messages becomes the bottleneck for scaling globally. We started off with a simple multi-cast protocol but the goal is to make this a pluggable layer with alternate approaches that could be more performant.

Below is a real data of the network latencies of sending messages w.r.t one of the nodes. That is, Miner 0 has the following network latencies as given in Small Message Time and Large Message Time. The reason we are tracking these separately is the consensus protocol has
different size payloads which we are largely classifying as small vs large. For example, the entire block is a large message where as a verification signature of a block is a small message.

Network timing of various mining and sharding nodes w.r.t Miner 0

The above nodes are in the datacenters at the following locations.

s00,m00: california
s01,m01: ohio
s02,m02: sydney
s03,m03: oregon
s04,m04: seoul
s05,m05: singapore
s06,m06: ireland
s07,m07: frankfurt
s08,m08: tokyo

Here are some important observations from the above network timings in a global deployment mode.

  1. There is a very large variance in the network timings.
  2. Any given node will have such tail-end network latency problem. This means, it’s best to pick the threshold signature count to be within a certain bound of the network latency. As an example, saying that we need 6 signatures out of 9 allows modeling the max network latency to be at 665 milliseconds as opposed to the absolute maximum of 826 ms. Do note that we also have a signature policy that goes by weightage based on stake and that will not allow this network optimization as it would require notarization from the nodes that have sufficient stake no matter where they are located relative to the generator of the block.
  3. There is no correlation between the latency and failure rate.

In our current protocol, all the network latencies of each node to its peers is explicitly tracked and in realtime and this information is used for optimal routing as appropriate to execute the various blockchain protocols.

Experimenting with just 9 zones is already providing to be quite useful in understanding the behavior of our protocol and putting in appropriate measures to optimize the performance. Eventually when we scale this to 100 nodes in even more number of zones, we will be starting off with a solid foundation and hopefully there will be no or fewer challenges to deal with at that time related to global scaling.

With a minimum of 665 ms latency to send a block and 139 ms to receive a verification ticket, we are already looking at a minimum latency of 804 ms. This makes it hard to achieve the sub-second finality. So our upcoming global testnet will temporarily violate our sub-second finality. However, since 0chain is offering both pubic and private chains, we will also be studying global deployment under a controlled network environment by running this within an VPC network. So, as long as we can keep the CPU and Storage workload timings as low as possible, network latency is the only thing that becomes the bottleneck which can further be optimized with VPC in a private blockchain deployment. Even for our public blockchain we will continue to investigate various options to bring down the time spent purely sending messages over the network.

--

--