State of Sharding Blockchains — I

Gokul B Alex
The Capital
3 min readJul 11, 2023

--

Sharding in the world of databases has a rich history that spans several decades. From early attempts at distributed databases to modern implementations in cloud computing and blockchain technology, sharding has proven to be a powerful tool for achieving scalability and performance in data storage and processing.

During the early 2000s, companies dealing with large-scale data sets, such as Google and Yahoo, faced challenges with database scalability. They pioneered application-level sharding, where developers manually partitioned data across multiple databases based on specific data attributes or hash functions. This approach required extensive application modifications and lacked dynamic scaling. NoSQL databases in the late 2000s introduced dynamic sharding as a built-in feature. NoSQL databases like Apache Cassandra and MongoDB allowed data to be distributed across nodes automatically. Dynamic sharding enabled elastic scaling and simplified data distribution, making it easier to accommodate growing data volumes.

Many prominent blockchain protocols such as Ethereum, Polkadot, Zilliqa, Harmony, Elrond, Avalanche, NEAR, etc. have adopted sharding as a scalability technique. Recently, Shardeum protocol has captured the attention through their sharding approach known as Dynamic State Sharding. Let us understand sharding techniques used in blockchain protocols from three fundamental vantage points — namely communication sharding, computation sharding, and storage sharding.

Image by Gerd Altmann from Pixabay

In communication sharding, participating nodes are divided into different shards where nodes in each shard only need internal communication most of the time. This is possible when we can achieve a greater degree of communication autonomy in a shard. This also opens up the possibility for interacting with the nodes in a sharded manner in pre-consensus and post-consensus scenarios. The clients and nodes within each shard could obtain the current state of the blockchain by communicating with the intra-shard nodes that are responsible for maintaining the blockchain.

In computation sharding, each shard is only responsible for processing its defined set of transactions. This requires the implementation of a cross shard computation mechanism. In storage sharding, nodes of different shards, only need to store the data related to its corresponding shard. State sharding, transaction sharding, computation offloading, proof verification, multi-party computation, etc. are some of the prominent approaches from this vantage point. State sharding is used to manage the data and storage, while transaction sharding is used to distribute the processing of transactions. A hybrid sharding can offer a well-balanced approach that addresses both data and transaction processing efficiently.

Storage sharding allows nodes to store a fraction of the entire blockchain system data, reducing the storage burden of nodes. This characteristic can be harnessed by nodes to shard the storage space in heterogeneous ways. A coordination mechanism is needed to manage the distribution and retrieval of data across shards. This can involve assigning specific time ranges or data segments to individual shards to avoid data duplication and improve data access efficiency. Timestamp synchronization and conflict resolution mechanisms may be required to handle cross-shard data dependencies.

Sharding involves partitioning the data into different shards, and certain consensus algorithms may have dependencies on the entire dataset. In such cases, implementing sharding could introduce additional complexity and limitations. The consensus algorithm used in a sharding blockchain should be able to handle the unique challenges of cross-shard communication, state synchronization, and security. Depending on the consensus algorithm and the type of blockchain application, sharding approaches can be classified as transaction-based sharding, account-based sharding, or hybrid approaches. Transaction-based sharding divides the workload based on the transactions, while account-based sharding partitions are based on the account addresses.

When we study the sharding blockchains further, it is interesting to deep dive into sharding blockchains based on their functional components — node selection, epoch randomness, node assignment, intra-shard consensus, cross shard transaction processing, and shard reconfiguration.

--

--