What is Sharding:

Nihar Pachpande
Fifth P
Published in
8 min readAug 10, 2021

KEY TAKEAWAYS

  • Sharding is a database partitioning technique being considered by blockchain networks and being tested by Ethereum.
  • The more users that blockchain networks take on, the slower the network becomes, leading to significant latency.
  • Sharding can improve network latency by splitting a blockchain network into separate shards — each with its own data, separate from other shards.
  • Security concerns surrounding sharding include a hack or shard takeover, where one shard attacks another, resulting in a loss of information.

What Is Sharding?

Sharding is a database partitioning technique used by blockchain companies with the purpose of scalability, enabling them to process more transactions per second. Sharding splits a blockchain company’s entire network into smaller partitions, known as “shards.” Each shard is comprised of its own data, making it distinctive and independent when compared to other shards.

Sharding can help reduce the latency or slowness of a network since it splits a blockchain network into separate shards

https://www.youtube.com/watch?v=WI6pE5nFeFI

Understanding Sharding

Blockchain networks and their respective cryptocurrencies are gaining in popularity due to the widespread application of the technology, which includes supply chain management and financial transactions. As the popularity of blockchain grows, so too does the workload and transactional volume that is handled by the network. If we think of a blockchain as a shared database, as more and more data is added, the network needs to find new ways to be able to process all of that data efficiently and quickly, which is where sharding can help.

Distributed Ledger

The distributed ledger of blockchain technology makes it attractive since it allows the transactions to be consensually shared across multiple sites and geographies. As transactions are recorded, copies are sent to the shared network within seconds creating public “witnesses.” If one portion of the network falls victim to fraud or a malicious attack, the shared network’s participants can identify what was changed by the fraudsters since they all maintain a copy of the ledger’s transactions. As a result, blockchain technology and its distributed ledger system can help reduce fraud and limit the damage from cyberattacks, such as a hack.

Scalability

However, one of the major challenges with blockchain technology is that as additional computers are added to the network and more transactions are processed, the network can become bogged down, slowing the process — called latency. Latency is a hurdle to blockchain being adopted for widespread use, particularly when compared to the current electronic payment systems that work quickly and efficiently. In other words, scalability is a challenge for blockchain since the networks may not be able to handle the increased amounts of data and transaction flow as more and more industries adopt the technology.

One of the solutions being considered for creating latency-free scalability is the process of sharding. Sharding is designed to spread out the workload of a network into partitions, which may help reduce latency and allow more transactions to be processed by the blockchain.

How Sharding Is Accomplished

Before exploring how sharding is accomplished within a blockchain network, it’s important to review how data is currently stored and processed.

Blockchain Nodes

Currently, in blockchain, each node in a network must process or handle all of the transaction volumes within the network. Nodes in a blockchain are independent and are responsible for maintaining and storing all of the data within a decentralized network. In other words, each node must store critical information, such as account balances and transaction history. Blockchain networks were established so that every node must process all of the operations, data, and transactions on the network.

While it ensures a blockchain’s security by storing every transaction in all of the nodes, this model slows transaction processing considerably. Slow speeds for processing transactions do not bode well for a future in which blockchain becomes responsible for millions of transactions.

Sharding can help since it partitions or spreads out the transactional workload from a blockchain network so that every node doesn’t need to handle or process all of the blockchain’s workload. In a way, sharding compartmentalizes the workload into partitions or shards.

Horizontal Partitioning

Sharding can be accomplished through the horizontal partitioning of databases through division into rows. Shards, as the rows are called, are conceptualized based on characteristics. For example, one shard might be responsible for storing the state and transaction history for a specific type of address. Also, it might be possible to divide shards based on the type of digital asset stored in them. Transactions involving that digital asset might be made possible through a combination of shards.

As an example, consider a rental real estate transaction in which multiple shards are involved. These shards correspond to different entities involved in the transaction, from customer name to digital keys configured into a smart lock that is made available to the renter upon rent payment.

Shard Sharing

Each shard is still able to be shared amongst the other shards, which maintains a key aspect to blockchain technology — the decentralized ledger. In other words, the ledger is still accessible to every user allowing them to view all of the ledger transactions.

Sharding and Security

One of the main issues in the practice that has arisen is security. Though each shard is separate and only processes its own data, there is a security concern regarding the corruption of the shards, where one shard takes over another shard, resulting in a loss of information or data.

If we think of each shard as its own blockchain network with its authenticated users and data, a hacker or through a cyber attack could take over a shard. The attacker could then introduce false transactions or a malicious program.

Ethereum, is on the front line of testing sharding as a possible solution to latency and scalability issues. Ethereum has combated the potential of a shard attack by randomly assigning nodes to certain shards and constantly reassigning them at random intervals. This random sampling would make it difficult for hackers to know when and where to corrupt a shard.

Also, it’s important to note that sharding is still in the early testing phase of being used for blockchain networks.

Other blockchains using sharding

Zilliqa is one of the very few protocols that promises sharding, thus we followed it closely from the beginning.

The essence of Ziliqua can be summarized in several bullet points:

  • Execute all the single-shard transactions in parallel;
  • Do not execute transactions that affect the same smart contract in parallel;
  • Do not execute any transaction that affects more than one shard in parallel with any other transaction.

Executing only single-shard transactions in parallel

Only executing in parallel transactions for which the transaction initiator and the smart contract are on the same shard might not be a big problem. In Fleta, the payments are entirely designed on the idea that shards can be treated interchangeably. It doesn’t quite work for Zilliqa, since in Fleta the shard is dictated by the sender, while in Zilliqa it is dictated by the shard of the contract, but it suggests that a similar idea might be applicable.

No state sharding

Not sharding the state makes our lives easier. For example, if the state is sharded, then even the very first example in Zilliqa’s blog post becomes obsolete: assigning the payment to the shard of the sender would not be enough, since the shard of the sender would not be able to update the state for the receiver. As a result, a task as simple as processing payments becomes very complex once the state is sharded. However, It is also worth noting that even in the absence of sharding by state, assigning payments to the sender’s shard only works if the accounts are represented as UTXO. If accounts store the accumulated amount, then two shards processing transactions with the same receiver will apply conflicting updates to the receiver’s account.

Nevertheless, not sharding by state, while simplifies the system design, imposes a huge limit on the scalability of the system. The only reason why Ethereum nodes can still store the entire state is that Ethereum only processes 14 transactions per second. Once a system processes thousands of transactions per second, the state will explode, since transactions do leave a trace on the state. Introducing sharding by the state later will be as hard as introducing sharded processing into modern non-sharded blockchain protocols.

Not executing transactions that affect the same smart contract in parallel

Similarly, not sharding smart contract processing, while making the implementation simpler, limits the scalability of a protocol. Ultimately, in any ecosystem only a few applications dominate the usage, and as Zilliqa scales to thousands of shards, five top dApps will have to reside in five shards and be limited by both the shard’s processing power (and its storage once sharding state is introduced).

With the limitations described above and while also not processing contracts that by design affect multiple shards in parallel, Zilliqa will just make another incremental change in the landscape of scalable blockchains. They might outperform EOS, Thunder, and Algorand (or at least provide better decentralization than the former two), but are not future-proof, and such limitations will prevent them from scaling with the demand for the decentralized applications platform.

The area of research concerned with the execution of distributed transactions has a long history, and shall not be ignored in the development of sharded blockchain protocols.

For example, implementations of Map-Reduce, or generally engines that involve parallel processing, shuffles, and aggregations, have been used for parallel execution of complex transactions for more than a decade.

Why then do we not see an emergence of sharded blockchain protocols that are powered by techniques proven in the industry? The primary reason is that building distributed systems in presence of failures is an extremely complex engineering task. The number of production-tested distributed database systems that are not coming from engineering giants such as Amazon, Microsoft, Google or Facebook, who have access to the best distributed systems engineering talent, is very small.

From this perspective, Near Protocol, with its exceptional team of distributed engineers is uniquely positioned to build a sharded decentralized applications engine.

At this stage, we do not have our sharding technical paper finished — but we will release it soon. The way we develop our approach is more practical in nature, where we first build a prototype to test all of our hypotheses. In a field as complex as distributed systems writing a whitepaper before having a working implementation is often a rushed decision, although it seems to be a widely adopted approach for blockchain projects.

--

--

Nihar Pachpande
Fifth P
Editor for

Marketer Brand strategist, IIMB alum, Mechanical Engineer. Looking to get into augmented reality, gaming & Music industry.