Traditional database systems achieve scalability by dividing the database states into independent shards (or partitions). By distributing the workload over multiple shards, the over- all capacity of the system increases. Sharding requires co- ordination to ensure ACID properties for transactions that access multiple shards. Two important coordination protocols are distributed commit such as two-phase commit (2PC) which ensures atomicity, and concurrency control such as two-phase locking (2PL) which achieves isolation.
The design of a highly scalable blockchain must meet the following three goals: (i) support a large network size equivalent to that of major cryptocurrencies like Bitcoin and Ethereum, (ii) achieve high transaction throughput that can handle the average workloads of centralized systems such as Visa, and (iii) support general workloads and applications beyond cryptocurrencies. The resulting blockchain will enable scale-out applications that offer the security benefits of a decentralized blockchain with a performance similar to that of a centralized system.
Building a blockchain system that achieves all three goals above at the same time is challenging. To have high trans- action throughput (second goal), it is necessary to build on top of a permissioned blockchain. But such a blockchain uses BFT protocols which do not scale to a large number of nodes, thus contradicting the first goal. As a result, one challenge is to reconcile the first two goals by making BFT protocols more scalable. We note that scalability here means fault scalability, which means that protocol’s performance degrades gracefully as the number of tolerated failures in- creases.
One of the ways to achieve the above goals together could be the implementation of database sharding technique to partition the network and the blockchain states into smaller shards randomly or based on their location. The process of assigning nodes to shards is called shard formation.
Forming shards in a blockchain system is more complex than in a distributed database. First, the nodes must be assigned to committees in an unbiased and random manner. Second, the size of each committee must be selected carefully to strike a good trade-off between performance and security. And finally, committee assignment must be performed periodi- cally to prevent an adaptive attacker from compromising a majority of nodes in a committee.