Ethereum 2.0: A Complete Guide. Scaling Ethereum — Part Two: Sharding.
Image Courtesy of CC Search. By Stan Levandovsky
Part One of this article explored some of the layer two ideas that have been investigated for scaling Ethereum and other blockchains. This second installment will discuss sharding, the layer one scaling solution that has been chosen by the Ethereum community as the best option to achieve massive, long-term scalability. The term sharding comes from database terminology and involves breaking up a single large database into smaller, more manageable pieces so that the data can be accessed and processed more quickly and efficiently. Sharding a database allows the total amount of data to exceed the capacity of any individual shard. Breaking a blockchain into shards to increase its scalability follows the same principles and goals but is much more complex to implement. Along with Casper and Ewasm, sharding is one of the primary features of the much anticipated Ethereum 2.0 update. Bitcoin and other blockchain networks have considered and experimented with partial sharding techniques that would shard either transaction processing or blockchain state, although most of these plans were never executed. The sharding envisioned for Ethereum 2.0 will address both of these major bottlenecks.¹ The official sharding FAQ on Github is very clear on this point: “We want to be able to process 10,000+ transactions per second without either forcing every node to be a supercomputer or forcing every node to store a terabyte of state data, and this requires a comprehensive solution where the workloads of state storage, transaction processing and even transaction downloading and re-broadcasting are all spread out across nodes.”¹ This all sounds great, but how will it actually work?
Divide and Conquer
On the current Ethereum network, each node must verify each transaction. This is an important feature to ensure the liveness of the network. Even if eighty percent of Ethereum nodes went down simultaneously, the network could still function. While having many nodes performing the same operations does not actually make the Ethereum network slower, it does not necessarily allow for optimal use of the network’s resources. Let’s look at a quick example to show how Ethereum’s numerous nodes could be used more effectively: say there are three nodes on the Ethereum network, node A, node B and node C. Currently, in order to verify a transaction, let’s say data T, each node will have to individually verify the whole data set of T, for it to be confirmed. The verification process secures the network, but it creates a bottleneck through which every transaction must pass. The network must wait until every node verifies every transaction so the total number of transactions per second can only be as high as the total transaction capacity of each node. With the proposed sharding protocol, T would be broken up into say, T1, T2 and T3. Nodes A, B and C each need to process just one of these smaller data shards for the whole data set to be verified. Even in this simple example, breaking up data into just three sets can greatly increase throughput and decrease transaction time. When optimally applied to a network as large as Ethereum, the improvement will be massive. At the time of writing, there are over 8,500 nodes on the Ethereum network.² The Ethereum team is working to find a number of nodes for each shard that is high enough to ensure excellent security, and low enough to provide high throughput.¹ Following this rule, the most recent spec calls for the Ethereum 2.0 blockchain to be divided into 1024 shards.³ Since each shard will be able to handle as much traffic as the current Ethereum network (and even more after Ewasm is implemented) sharding would increase Ethereum’s throughput and network speed by a factor of over 1000.⁴ The addition of layer two solutions such as zk-rollups and plasma will increase this number still further.
Sharding, Casper and the Beacon Chain
Before we get into the details of how sharding will actually work, it is important to mention Casper, the new Proof of Stake consensus engine that will replace the current Proof of Work engine in Eth 2.0. Casper will be delivered via the Beacon Chain, which will be the system chain of Eth 2.0. It will ensure consensus and finality, and will facilitate communication with and between the shards. Rather than relying on hardware mines to spend energy and computation to reach consensus, Casper will rely on validators who will stake their own funds to ensure consensus. (my next article will explain Casper and the beacon chain in more detail). Originally, Sharding and Casper were under construction by separate teams following separate development paths.⁴ However, since the spring of 2018, the sharding and Casper teams have begun to work closely together so that the systems may develop simultaneously and benefit from one another’s progress. Validators will be used to validate and achieve consensus of the shards and the beacon chain using the Casper protocol.
What are We Dealing With?
Because of the size of Ethereum and the inherent complexity of blockchain technology, sharding Ethereum will be anything but simple. Curran explains the crux of the issue: “…by sharding the nodes into smaller subsets, these subsets need to be able to process specific sets of transactions while simultaneously updating the state of the network, all while ensuring it is valid.”⁵ This means that for a sharded Ethereum to function as the current network does, a complex system must be developed to determine the distribution of responsibilities between the beacon chain and the shards and to establish how the shards will communicate with each other and with the beacon chain. The system must also include a mechanism to make sure that all the network data and transactions remain valid and available to achieve finality. Ethereum researchers and developers have wrestled with several ideas about what sort of system could best achieve all of these functions.
Before going any further it is worth pointing out that none of the proposed systems for sharding are actually in production. The teams developing Eth 2.0 clients are still working on Phase 0 which is focused on the beacon chain. A lot of the details of sharding are still being researched and developed but we can try to envision the proposed model given the most recent specification and posts on Ethereum Research.
Consensus and Finality
The introduction of sharding and the transition from a PoW to a PoS system means that achieving consensus and finality in Eth 2.0 will be quite different than in the current network. We will start with the basics and then get into details about some of the more complex functions that the system is supposed to achieve. User accounts will all be specific to a certain shard. Transactions within shards will be relatively simple and similar to Eth 1.0 transactions, while transactions between shards will require an added layer of complexity (more on this shortly). Transactions will be grouped into “transaction packages” following an optimization process that is still being developed. These transaction packages must pass through a double verification process in order to be appended to the mainchain. First, validators are periodically and randomly assigned to a shard. Once assigned, the validators vote on the validity of each transaction package. If they vote yes, a separate committee on the beacon chain must verify this vote using a sharding manager smart contract. If the second vote also passes, the transaction package will be appended to the mainchain and become part of the public ledger, establishing an immutable cross-link to the transaction group on that shard.¹ In later phases of sharding, this connection between shards and the mainchain will be such that if the mainchain or any of its shards are invalid, the whole network will be considered invalid.¹ Just like in an unsharded blockchain, the verification of a package will cause the network state to change which will be reflected by things like storage and account balance.⁵
After a certain period, the validators on each shard are relieved of their duty and return to a larger pool. They are replaced by new validators drawn from this same pool and randomly selected by the beacon chain.¹ The beacon chain manages the validator registry, provides the randomness (through a RANDAO and later a VDF), provides finality through casper FFG, and keeps track of shard cross-links. In phase 2, the beacon chain will also store execution environment contracts.⁷ The beacon chain uses a combination of a RANDAO mixing system to provide unpredictability and unstoppability, and a VDF delay system to provide unbiasability.⁶ Ensuring these three tenets of randomness will protect against potential attackers trying to predict which shard their node will be assigned to. The logic is that if an attacker does not know to which shard they will be assigned until assignment occurs, they will not be able to coordinate an attack in advance.This system of randomly selected, rotating validators and double verification will allow for consensus, help to keep the network secure from attackers, and ensure finality.
We now have a solid understanding of the relation between transactions, shards, and the beacon chain and how they can work together to achieve consensus and finality. However, for the Ethereum network to achieve maximum efficiency, shards must be able to transact between one another, and to reference each other’s data. This will necessitate communication between the various shards on the network. At Devcon 2018 in Prague, Ethereum co-founder Vitalik Buterin explained sharding in the following way:
“Imagine that Ethereum has been split into thousands of islands. Each island can do its own thing. Each of the islands has its own unique features and everyone belonging on that island i.e. the accounts, can interact with each other AND they can freely indulge in all its features. If they want to contact with other islands, they will have to use some sort of protocol.”⁴
This is a useful metaphor and it brings up one of the most important questions that must be addressed by Ethereum developers: how can many different shards communicate with one another to deliver the same seamlessly integrated system of today’s Ethereum network, while still achieving the massive scalability potential that sharding represents? As stated in Vitalik’s island analogy, in order to insure efficiency and security, interactions between individual shards need to follow a special protocol. The shards also need to know when it is appropriate to communicate and only do so when needed.⁵ The communication protocol chosen by the Ethereum community is called a “receipt paradigm.”¹ In addition to the state change and and transaction package discussed above, every transaction will generate a receipt. These receipts will be stored on the beacon chain via “distributed shared memory” which means the receipts can be seen by other shards but not modified by them. This feature is important because it allows shards to verify and benefit from each other’s activity but still maintains the finality of each individual shard.
Earlier ideas for sharding focused on how to best divide up data and responsibilities among the shards themselves. Recently however, Vitalik has released two new proposals on phase 2, the phase in which a fully sharded Ethereum will be established: Proposal 1 and Proposal 2. The ethos of these proposals “…is to have a relatively minimal consensus-layer framework, that still provides sufficient capabilities to develop complex frameworks that give us all of the smart contract capabilities that we need on top as a second layer.”⁷ To this end, the proposals call for the delegation of several tasks and responsibilities from the individual shards, to the beacon chain.⁷ Previously, shards would have functioned similar to an autonomous Ethereum blockchain, with their own transactions, Ether and smart contracts. With the new proposal, the base level concepts of both smart contracts and Ether will only exist on the beacon chain. Shards will continue to have their own state and their own execution. This should help to reduce the complexity of each individual shard, while maintaining the various functionalities of the network. Vitalik believes the new system will provide enough functionality to allow for an execution environment “…that supports smart contracts in shards, cross shard communication and all of the other features that we expect to be built using a beacon chain contract.”⁷ This new system will be accomplished through the addition of three new transaction types and two new data structures to the beacon state. The transaction types are NewExecutionScript, NewValidator, and Withdrawal.⁷ These transactions represent an execution script which can hold ETH, a function to add new validators, and a function to withdraw a validator from the beacon chain. During the addition and withdrawal of validators, the operation is authorized using an execution script and receipt system. The two new data structures added to the beacon state are ExecutionScript and WithdrawalReceipt. These new components will be used to facilitate cross-shard communication and to deliver a system in which the exchange of all Ether and the execution of all smart contracts can be achieved through layer 2 abstraction, without needing to include them in the shards themselves. Introducing this layer of abstraction using beacon chain contracts will help to keep each shard’s complexity to a minimum, which will simplify and improve communication between shards.
Beyond its inherent complexity, a further issue with cross-shard communication is latency. If we want to send a token from shard A to shard B, a transaction on shard A “destroys” the coins there, but saves a record of the address, the value sent, and the destination shard. After a delay, every shard learns the state roots of the other shards, which allows it to verify the receipts and confirm that a transfer has been made. At this point, the receipt from shard A will be recovered by shard B, which will verify its validity so that the token can be destroyed in shard A and recovered in shard B. This process causes a delay between when the transaction is first sent, and when it is actually confirmed and appended, and this would detract from user experience and counteract the very speed and scalability that Eth 2.0 is supposed to deliver.⁸ The title of the proposed solution, like many other names in blockchain, is short and sweet: Fast Cross-Shard Transfers Via Optimistic Receipt Roots… Despite the name, the concept is fairly simple. The idea is to store conditional states, and to be “optimistic” about the validity of a submitted transaction. Vitalik explains the proposed system using the following example:
“…if Bob has 50 coins on shard B, and Alice sends 20 coins to Bob from shard A, but shard B does not yet know the state of shard A and so cannot fully authenticate the transfer, Bob’s account state temporarily becomes ‘70 coins if the transfer from Alice is genuine, else 50 coins.’ Clients that have the ability to authenticate shard A and shard B can be sure of the “finality” of the transfer (ie. the fact that Bob’s account state will eventually resolve to 70 coins once the transfer can be verified inside the chain) almost immediately, and so their wallets can simply act like Bob already has the 70 coins.”
Once the transfer can be verified the transaction will become permanent if it was indeed valid, or will be reverted if it was not. Communication between shards is one of the most complex issues that Eth 2.0 developers are still working on. Inter-shard communication must be implemented successfully so that Eth 2.0 can retain the benefits of the current network, while vastly improving its scalability.
Beyond inter-shard communication, several challenges still face sharding today. We have already examined the possibility of a single shard takeover attack and determined that it can be countered by the random sampling and reshuffling of validators. However, this process of random sampling, while preferable to network insecurity, makes it harder for nodes to compute the root of shard and network state because they cannot be given advanced access to their assigned shard.¹ Allowing light clients to gain accurate information about the entire network state will be difficult for the same reason. Another area that still requires attention is that of fraud detection. If some node makes a claim regarding an invalid state or transaction group, how can the rest of the nodes be notified so that they may detect and reject the fraud?¹ Fraud also becomes an issue if data is missing from a transaction group, especially if that group is called upon in a cross-shard communication protocol. There will likely be even more challenges that arise as progress continues, but it is encouraging to note that most of the major issues discussed in the preceding paragraphs have already been addressed by the research team.
Eth 2.0 is under construction with five important design goals in mind: security, decentralization, resilience, longevity and simplicity.⁹ A rough launch date of Jan 3, 2020 has recently been announced for phase 0 of Eth 2.0.⁹ In order to ensure the above mentioned goals are met, the phases of Eth 2.0 that follow phase 0 will be rolled out gradually, about a year apart. Phase 0 consists of the beacon chain. The beacon chain will be the system chain for Eth 2.0 and its functions have been discussed above. It is through cross-links to the beacon chain that each shard will be able to communicate with the network and with one another (my next article will explain the beacon chain and the Casper protocol, stay tuned!). Phase 1 will introduce basic shards but will essentially be a test run for how a fully sharded system will work. As such, it will not immediately deliver the full scalability potential of sharding. This phase will address consensus and finality on shard chains, and will allow the beacon chain to monitor the execution of shard chains. Phase 2 will see the emergence of a fully sharded and integrated Ethereum 2.0. Shards will be upgraded from “rudimentary data markers” to “fully-functional chains.” Phase 2 will also see the introduction of Ewasm as Ethereum’s new virtual machine (I’ll be publishing a full article on this as well!). It should be noted that although these phases are divided conceptually, large parts of them will be worked on simultaneously because of their interconnected nature. Furthermore, efforts to improve the Ethereum 1.0 blockchain, often referred to as Eth 1.x, are still underway, and will continue well into the Eth 2.0 roll-out. The stated goals of Eth 1.x are to boost transaction throughput via client optimization, to implement “state fees” to ensure the sustainability of operating a full node, to stabilize transaction fees and to develop a finality gadget that can link the Eth 1.x chain to the Eth 2.0 chain. Given the current roll-out predictions, we can expect to see data sharding by late 2020, and a fully sharded Ethereum by 2021. There are a lot of brilliant and passionate individuals working to make this dream a reality. I for one, am hopeful that sharding can be implemented successfully and that it will help Ethereum and other blockchains to achieve massive scalability and mass adoption. If you are as interested in the development of Ethereum and blockchain technology as I am, check out my previous articles Ethereum 2.0: A Complete Guide and Scaling Ethereum Part 1, and follow me for my next two articles on Casper and Ewasm. Thanks for reading!
- “Sharding FAQs.” https://github.com/ethereum/wiki/wiki/Sharding-FAQs
- “Ether Nodes.” https://www.ethernodes.org/network/1
- “Sharding introduction R&D compendium.” https://github.com/ethereum/wiki/wiki/Sharding-introduction-R&D-compendium
- “What are Ethereum Nodes And Sharding?” https://blockgeeks.com/guides/what-are-ethereum-nodes-and-sharding
- “What is Sharding? Guide to this Ethereum Scaling Concept Explained.” https://blockonomi.com/sharding/
- “Ethereum 2.0 randomness.”
- “Phase 2, Proposal 1.” https://notes.ethereum.org/s/HylpjAWsE#
- “Fast Cross-Shard Transfers Via Optimistic Receipt Roots.” https://ethresear.ch/t/fast-cross-shard-transfers-via-optimistic-receipt-roots/5337
- “The Roadmap to Serenity.” https://media.consensys.net/the-roadmap-to-serenity-bc25d5807268
Special thanks to Cayman Nava, Aidan Hyman and Greg Markou for reviewing this article and making many valuable suggestions. Furthermore, this article would not have been possible without the work done by many other great writers and researchers in the space. Thanks to all of those involved in the creation and publication of the sources cited below!
Blockgeeks. “What are Ethereum Nodes And Sharding?” Accessed September 17, 2018. https://blockgeeks.com/guides/what-are-ethereum-nodes-and-sharding
Buterin, Vitalik. “Fast Cross-Shard Transfers Via Optimistic Receipt Roots.” Ethresearch. https://ethresear.ch/t/fast-cross-shard-transfers-via-optimistic-receipt-roots/5337
Buterin, Vitalik. “Phase 2, Proposal 1.” https://notes.ethereum.org/s/HylpjAWsE#
Buterin, Vitalik. “Sharding FAQs.” Ethereum Wiki. Last modified December 15, 2018. https://github.com/ethereum/wiki/wiki/Sharding-FAQs
Consensys. “The Roadmap to Serenity.” Medium. May 16, 2019. https://media.consensys.net/the-roadmap-to-serenity-bc25d5807268
Curran, Brian. “What is Sharding? Guide to this Ethereum Scaling Concept Explained.” July 13, 2018. https://blockonomi.com/sharding/
Drake, Justin. “Ethereum 2.0 randomness.” Youtube. Filmed Nov 1, 2018 at Devcon in Prague. Video 30:02. https://www.youtube.com/watch?v=zqL_cMlPjOI
“Ether Nodes.” The Ethereum nodes explorer. Accessed December 21, 2018. https://www.ethernodes.org/network/1
Ray, James. “Sharding introduction R&D compendium.” Ethereum Wiki. Last modified October 17, 2018. https://github.com/ethereum/wiki/wiki/Sharding-introduction-R&D-compendium