An Overview of Networking in Serenity
Special thanks to Hsaio-Wei Wang, Kevin Mai-Hsuan Chia and John Adler for the edits and valuable feedback.
Networking in sharded blockchains is a hard problem. How do we go about designing and building scalable secure peer-to-peer networks for sharded blockchains? As of this writing, there are no deployed sharded blockchain systems in production, so, we have no precedent on how to design such peer-to-peer networks. This is a design problem that is currently facing Ethereum 2.0. The goal of this post is to get you up to speed on the research that is being conducted on the p2p layer of ETH2.0 and perhaps, start contributing to this part of Ethereum 2.0 research. Throughout this post, I will assume that you are familiar with the current ETH2.0 specifications.
Networking Requirements in ETH2.0
Before deep-diving into the current ideas and protocols that are being considered, we first need to understand what are the requirements for Serenity’s p2p network.
First, nodes in the network need to be able to easily subscribe to multiple shards simultaneously. This includes validators that get randomly assigned to shards by the beacon chain, and non-validating nodes, i.e., nodes that don’t participate in the consensus protocol, that connect to different shards for their particular needs.
Second, nodes in the network need to be able to jump between shards with low latency. Since validators are randomly assigned to shards, it is important that they are able to switch between shards as needed as efficiently as possible. It follows that nodes need to be able to find and connect to other nodes as quickly as possible.
Third, we need to take into account that a lot of data is being passed around shard networks and the beacon network. These networks need low latency in order to make sure that the nodes in these networks receive blocks, attestations, etc in order to effectively participate in the protocol. These networks also need sufficiently high bandwidth for these data as well, even though the size of blocks, attestations, etc is relatively constant. The numbers for bandwidth and latency are hard to estimate ahead of time but Jannik, from BrainBot, has estimates on what these numbers can look like. Note that these estimates are out of date as the Phase 0 specification has changed significantly.
Taking into account the design requirements for the p2p network, these are the current ideas that are being considered for the network. These ideas are based on collaboration between the Ethereum Foundation and the Libp2p team at Protocol Labs. Libp2p is a modular framework for building scalable peer-to-peer networks. While it is currently very experimental, there has been great progress towards making it easier to use for developers of peer-to-peer networks.
An Efficient PubSub System: GossipSub
Nodes in the network communicate using a pattern known as Publish-Subscribe (PubSub). In this framework, nodes that send messages don’t send messages to specific nodes. Instead, they categorize these messages into topics and send messages to nodes subscribed to these topics. The nodes that send these messages are known as publishers and nodes that subscribe to topics are known as subscribers. Thus, publishers need not know who they are publishing to and subscribers need not know who the publishers of the topics are. This framework for sending and receiving messages enables greater network scalability and a more dynamic network topology.
There are different variants of the publish-subscribe framework specified in libp2p. The one under current consideration for Serenity is called GossipSub, which uses gossip to efficient route messages to subscribers. GossipSub uses ideas from RandomSub and MeshSub along with gossiping in order to efficiently route messages in the overlay network. The main difference between GossipSub and other PubSub variants is how it goes about publishing messages. Messages are published using MeshSub but metadata is sent to non-mesh peers using RandomSub in a gossiping fashion. In other words, peers that are not in the mesh created by MeshSub, will be able to get metadata about messages in the topic mesh, a mesh whose peers have subscribed to the same topic, without directly subscribing to that topic. If needed, these peers can then query for the actual messages. Notice that this reduces the amount of messages that each peer needs to send and thus reduces the degree of each peer in the overlay mesh network created by GossipSub.
In ETH2.0, tentatively, there will be two kinds of topics: global topics and local topics. Global topics will be for messages that all nodes in the network need to know about, such as beacon chain messages and shard IDs. Local topics will be for messages that pertain to a particular shard, such as shard headers and shard transactions.
Peer Routing Using Kademlia DHT
Peer routing algorithms are how peers in a network decide which peers to route messages through. In other words, how peer IDs get mapped to a peer’s IP address and port number. In ETH2.0, Kademlia DHT is currently under consideration due to its robustness and ease of use as it is implemented in Libp2p and currently use in ETH1.0. For specifications of libp2p’s implementation of Kademlia, you can read libp2p’s Kademlia specification.
Peer Discovery Using Bootstrapping and Kademlia DHT
In order to communicate effectively in a peer-to-peer network, peers have to be able to find each other. In ETH2.0, a combination of bootstrapping and Kademlia DHT will be used to help peers find each other.
Bootstrapping is a peer discovery mechanism used in the current ETH1.0 network in which a new peer wishing to connect to the network connects to a set of predetermined peers in the network. These peers are typically hard-coded in the client and selected by the client’s developers. This process facilitates the onboarding of new nodes to the network. Kademlia provides this bootstrapping mechanism as part of its specification.
Once the peer has joined the network using Kademlia’s bootstrapping mechanism, it can discover new peers by querying nodes in its own k-buckets that are closest to a desired key. Then, these queried nodes initiate the same process and return their resulting list of nodes that are closest to the desired key. This process gets repeated until no queried node returns a list of closest nodes to the desired key.
Libp2p vs Devp2p
You might have noticed that Devp2p wasn’t mentioned in the previous section on the current p2p networking stack. The main reason for this is that due to the limited documentation and the lack of changes since its proposal in 2014, it was decided that Ethereum will be switching from Devp2p to Libp2p. Also, Libp2p’s modular design enables us to easily build custom networking functionality needed for sharding. We cannot easily do this with Devp2p. For a short overview of the advantages of using Libp2p, you can read Parity Technologies’ article on why they are using Libp2p in their projects.
You should probably have a good overview of what networking might look like in ETH2.0 and perhaps, you want to start contributing. This is a list of open problems from which you can start thinking of how to contribute. This list isn’t exhaustive but it’s a start.
Validators are an important part of the Ethereum network. They keep the protocol running and ensure that the network is safe. Attackers might want to identify validators in order to attack the Ethereum network in some way. Attackers that want to DDoS a particular validator are obviously hard to prevent but the goal is to make it hard to identify validators to begin with. Ideally, we have mechanisms in place so that an attacker can only attack a random subset of all available nodes. Jannik has an issue brainstorming about different aspects of validator privacy.
There have been several suggestions about how to go about improving validator privacy. However, all of these suggestions are in the brainstorming phase and there is nothing concrete. There have been some suggestions to use current p2p level anonymity solutions such as Tor and/or I2P. These approaches have been analyzed by Nicolas Liochon, an engineer at PegaSys. You can read his analysis here. The gist is that it would be reasonable to integrate the ETH2.0 network in either the Tor or I2P networks. I would like to note that there is precedent for attempting to integrate these technologies into other cryptocurrencies, the most famous example being Monero. You can read all about Monero’s attempt at building Kovri, a C++ implementation of the I2P network, here.
Another approach that was discussed is based on Dandelion++. The basic idea is to create an anonymity graph as a subgraph of the underlying network topology, have messages travel along pseudo-randomly generated paths on the anonymity graph and then diffuse messages to surrounding peers. Dandelion++ is currently under consideration by the Bitcoin Core and Grin networks. The proposal is to use the first 2 steps of the Dandelion++ protocol and then in the 3rd step, instead of simply diffusing messages to nearby peers, peers publish messages as in regular GossipSub. You can read the details of the proposal here.
Signature aggregation is an important part of ETH2.0. Its main use is to aggregate the signatures from attesters into a single signature to be stored in attestations. This removes the need to store all the signatures from attesters. The signature scheme that being used is the BLS signature scheme. The main advantage of using the BLS signature scheme is that as long as the signatures have been generated, we no longer need the signers in order to aggregate the signatures. You can read more about the details of the BLS signature scheme here. Now, how do we go about aggregating these signatures in the p2p network? Well, we don’t really know yet. There is a current proposal by the PegaSys team called Handel. The paper hasn’t been published yet, however the GitHub repository is public and a talk was given at the Stanford Blockchain Conference. However, nothing has been decided yet. If you have an idea on how to do this, please reach out.
The current peer discovery design is fine for the testnet phase of Phase 0 but is not optimal for a production, decentralized P2P network. The main contender for peer discovery is discv5. A version of discv5 is deployed in Geth 1.5 and above but other Ethereum clients don’t use discv5. The specification for discv5 is currently underway and are unofficial. Felix Lange has a draft that you can read here.
A wire protocol is needed if we want different clients to talk to each and be interoperable. This is currently a work-in-progress and Danny Ryan has posted a minimal wire API for Phase 0 that can be implemented once a particular wire protocol has been specified. A wire API for subsequent phases would need different requirements to take into account shards. During the working group session in Prague before Devcon4, a list of requirements of what a wire protocol for ETH2.0 would need was developed. You can read the details here. Perhaps taking a look at the current ETH1.0 wire protocol could help in designing a wire protocol for ETH2.0.
The current Ethereum network uses a combination of UDP for node discovery and TCP for all other p2p communication. Specifically, for node discovery, it uses a Kademlia-like system on top of UDP and a custom transport protocol, RLPx, built on top of TCP. UDP is a common choice for node discovery in peer-to-peer systems. The use of TCP is for sending larger messages to other peers in the network. This approach is commonly used in many Internet applications such as games. However, TCP can be slow at times, since TCP needs to make sure that packets are in the ordered correctly and needs to wait for responses from peers. TCP has error recovery built in in order to reliably send packets to peers in the network.
It seems like there is room for improvement on the transport protocols used in Ethereum. Maybe, we can find a transport protocol that is as fast as UDP and as reliable as TCP. The main contender for such a protocol is the experimental transport called QUIC, proposed by Google, in 2012. QUIC is a UDP-based transport protocol that aims to improve the performance of sending packets over the Internet. Moreover, QUIC has encryption built-in in. This means that it would be harder to block communication between peers. The main advantages of using QUIC over TCP is two-fold. First, QUIC requires onepacket in order to establish a connection instead of TCP’s 3-way handshake. Second, it uses multiplexing so that multiple streams of data can reach its endpoints independently, in constrast to TCP, where connections can be blocked if an error is found in a data stream. The advantages of using QUIC over TCP in ETH2.0 has been discussed in this issue. The PegaSys team plans on doing more research in order to determine its viability in ETH2.0. It might be worthwhile to do an analysis to see what kinds of improvements need to be made in order to have a reliable and fast transport protocol for ETH2.0.
Alternatives to Kademlia DHT
During the working group session after the Stanford Blockchain Conference, a common theme that popped up is the use of Kademlia DHT for peer discovery and peer routing. Kademlia DHT and variants have been used in the current Ethereum network due to how well-known it is. But, this extensive usage has made the Ethereum network susceptible to a range of attacks such as eclipse attacks. Moreover, the Ethereum network doesn’t make full use of Kademlia’s capabilities. In fact, Ethereum doesn’t need Kademlia’s content discovery capabilities. These facts suggest that we should explore non-Kademlia alternatives for peer routing and discovery.
Join The Conversation
You should now have a better idea of how the networking might look like in Serenity. Hopefully, you will want to contribute to the research and implementation of a robust and scalable p2pnetwork for ETH2.0. If you want to contribute to the ongoing discussions and brainstorming, check out the Ethresearch p2p repository on GitHub and share ideas in the sharding gitter channel and the newly-created p2p gitter channel, the ETH2.0 spec issues tracker and on Ethereum research forum.