A new P2P layer is coming to the Internet Computer
Rollout of a novel P2P layer on the Internet Computer has begun. Its design increases the robustness and speed while providing all required guarantees to make the Internet Computer unstoppable.
Author: Yotam Harchol
The Internet Computer blockchain relies on a peer-to-peer (P2P) protocol that distributes messages (artifacts) among the nodes of each subnet. The protocol is a set of protocols that operate the Internet Computer blockchain, and includes the Internet Computer Consensus protocol, the DKG (distributed key generation) protocol, or the state-sync protocol. Each such protocol generates artifacts, and needs the P2P layer to distribute these artifacts to peers in the subnet. We denote each one of these protocols, which are implemented as higher-layer components above the P2P layer, as P2P clients.
After introducing a separate P2P layer for state sync, a new P2P layer for all other P2P clients within the Internet Computer protocol stack is being introduced. This new P2P layer provides improved network performance, ensures the guarantees required by the consensus protocol, and makes it easier to detect the misbehavior of nodes.
The newly proposed P2P layer uses the new QUIC-based transport layer recently introduced. Thus, with the migration to this new layer, the Internet Computer’s P2P layer will stop using TCP altogether. The shift to QUIC also means a shift to a fully asynchronous implementation of the P2P layer. Each request is sent as a new QUIC stream, and is handled independently from other requests. This prevents potential head-of-line blocking issues that could, at least theoretically, lead to liveness issues.
The new P2P layer introduces the use of a novel abstract data structure, called a slot table, which makes it easier to control the distribution of artifacts to peers, while maintaining adequate send rates to each peer based on the connection quality, and without affecting the other peers. It also enables easier detection of peer misbehavior.
With the acceptance of the proposed new P2P layer, each client will use a separate instance of the P2P protocol, where state sync will use the one designed for it specifically, and the rest of the clients will use separate instances of the new P2P layer that will be described in detail in this post. The rollout of the newly proposed P2P layer has started with the adoption of a set of NNS proposals, which shifted HTTPS-outcalls artifact distribution to use the new P2P layer. Later, all remaining clients will be shifted, including the consensus protocol, eventually rendering the old P2P layer obsolete.
Background: the P2P layer of the Internet Computer
On a very high level, the P2P protocol is responsible for the distribution of any artifact that exists in the validated artifact pool of each client, to the peers in the same subnet. The validated artifact pool is filled by the clients based on their needs to broadcast these artifacts to peers.
Figure 2 shows the interface between P2P and the clients above it. The clients may arbitrarily change the artifact pools, on each call to on_state_change(). Each such call eventually returns a set of ChangeActions that correspond to the addition and deletion of artifacts from the validated artifact pool of that client. The P2P layer should use this information to propagate the content of the validated pool to peers.
The existing P2P protocol of the Internet Computer is based on streaming of artifacts from each node to its peers. Whenever an artifact is added to the validated pool, an advert is broadcasted to all peers. An advert is a small message that describes the artifact. It is sent instead of the artifact itself as a measure for bandwidth saving, so that the recipient can decide whether it wants to download the (potentially large) artifact. The sending node maintains a single TCP stream to each of its peers, on which adverts (and later also artifacts, as requested) are sent.
The P2P layer should guarantee the delivery of artifacts between honest and operational nodes, and should be resilient to malicious behavior by potentially malicious nodes.
There are two important properties of all consensus-related clients of the P2P layer:
P1: Bounded number of active artifacts: The validated artifact pool, at any point in time, is bounded in size. The consensus protocol uses checkpoints that allow it to purge artifacts periodically, and therefore the maximal size C of the pool can be calculated as a function of the checkpoint interval (which is measured in number of blocks), and the size of the subnet.
P2: Explicit Expiry of Artifacts: If an artifact is removed from a pool (being purged), it is no longer necessary to disseminate it to peers. Or, from the receiver side, if no peer has a certain artifact, the receiver is guaranteed to not need that artifact, even if it failed to receive it by the time it has been removed by all other peers.
These two properties are emphasized here as they enable important design decisions that will be explained soon.
Network backpressure
In traditional client-server applications, the notion of backpressure is widely used: if the receiver slows down consuming messages, the sender’s buffer fills up, and then the sender’s networking layer must take one of the three paths:
- Propagate the backpressure to the application layer, so that the application slows down data production.
- Buffer messages (potentially indefinitely).
- Drop egress messages.
On a blockchain, this becomes more complex. Imagine that a sender experiences backpressure from one peer. This peer might be honest, or not. Taking any of the above-mentioned approaches leads to serious problems:
- Slowing down data production means slowing down the blockchain. Allowing this behavior would be a denial-of-service (DoS) attack vector.
- Buffer, potentially indefinitely, would also be an attack vector.
- Dropping egress messages might mean there is no delivery guarantee to honest but slow nodes.
Most blockchains chose option 3 due to the security risks of options 1 and 2. However, option 3 risks the liveness of the blockchain (i.e., it might get stuck if enough messages are dropped). The proposed new P2P layer for the Internet Computer overcomes the risks without dropping messages.
The new P2P layer
The new P2P layer works very differently from the existing one. First, it does not always use adverts. If artifacts are small enough, they will be sent immediately, without any advert. Second, it does not use a single stream, but instead uses multiple streams over the same QUIC connection. Third, because it does not use a single stream, it manages the sending of artifacts very differently. Fourth, it introduces a slightly different protocol for communication between peers in the same subnet.
Let’s take a step back and look at the purpose of the P2P layer when serving the consensus protocol and other clients with similar requirements (i.e., not state sync). The purpose is, for every honest node, to make sure that peers can receive whatever that node has in its validated artifact pool. Of course, while keeping everything secure, scalable, and high-performant.
The new P2P layer achieves this purpose by introducing a novel abstract data structure called a slot table, which is used to track the content of the validated artifact pool and the process of updating the peers about that content. The slot table data structure is simple, yet it provides exactly what is needed to satisfy the requirements.
The slot table data structure
The slot table is an abstract data structure that is maintained by every node on the send side, and is also inferred by every node on the receive side. The size of the slot table on the send side corresponds exactly to the number of active artifacts in the validated pool. If you recall property P1 above, this means that the slot table is bounded to some constant C.
Whenever an artifact is added to the validated pool, it is added to an empty slot in the slot table on the send side. A slot update message is being sent out to all peers, telling them that the content of the slot has changed. Each peer, on the receive side, tracks the state of the slot table of each of its peers, based on the arrival of new slot update messages. However, note that network congestion and backpressure could lead to a delay in these updates. Therefore the receiver’s view is only eventually consistent with the send side slot table.
Each slot has, in addition to the artifact information, a version number. This version number is incremented globally with each update to the validated pool, so that the receiver can know, when receiving an update message, whether it is a new update or an old one, by only accepting updates with higher version number than the one it already has.
Figure 3 shows a sample of this process: the sender generates artifacts A through F. It also removes some of them during the process. Since deletions are not necessarily propagated to the peers, they may still have deleted artifacts in their view of their peers’ slot tables. This is correct because eventually the content of the slot will get updated and this will be propagated.
For each slot on the send side, a new set of asynchronous tasks (i.e. a green thread) is spawned — a task per slot per peer. Since tasks are very lightweight, this is scalable also for bigger subnets. Each task is responsible for reliably pushing slot update messages for the corresponding slot and the corresponding peer. This means that the task retries pushing the update until it receives an acknowledgement. Whenever the content of the slot changes, the task stops trying pushing the old content, and instead starts trying to push the new content. A slow peer might get its updates slowly, but it would not interfere with faster peers.
This approach, which is sort of a combination between approaches 2 and 3 in the backpressure discussion above (buffering messages at the networking layer, and dropping messages), solves the backpressure problem, without giving up resilience and liveness.
The correctness of this approach stems from property P1 of the clients mentioned earlier, the bounded number of active artifacts. It guarantees that C slots are enough in all circumstances, and therefore slots can be reused. The protocol described above not only allows peers to synchronize their content of their validated artifact pools, but it also allows nodes to make sure their peers do not advertise more than C artifacts at a time. If an update message has a slot number that is larger than C, the recipient can immediately infer a misbehavior by the sender.
When a node notices, on the receiving side, that an artifact no longer exists in the slot tables of any of its peers, the node can safely remove the artifact from the unvalidated artifact pool, if it is still there, or stop any attempt to retrieve this artifact if it has not yet been retrieved. Property P2: explicit expiry of artifacts, mentioned earlier, guarantees that such artifacts are not needed by any peer anymore. Thus, the slot table also imposes an implicit bound on the size of the unvalidated artifact pool: the unvalidated pool may only contain up to C artifacts from honest peers (since they should all have roughly the same content in their validated pools), and, at most C more artifacts for each malicious peer, because it could spam the IC with C completely different artifacts. Less than ⅓ of the nodes can be malicious, thus the total size of the unvalidated pool cannot exceed C*4*n/3 artifacts.
You may have noticed that the above design description only refers to artifacts and does not mention adverts. The reason is that adverts are merely an optimization for improved bandwidth utilization. The new P2P layer uses adverts only for large artifacts (the current threshold is set to 1KB). Artifacts smaller than the threshold are sent in the slot update messages, and therefore do not have to be explicitly requested later by the receiver. For artifacts larger than the threshold, an advert is generated and sent in the update message. The receiver client can then decide whether it wants to request it and from which peer.
Improved performance
The async implementation using QUIC, and the direct pushing of smaller artifacts, yield improved networking performance and therefore, also improved consensus performance.
Figure 4 shows the result of one experiment we made to compare the performance of the existing and the new P2P layers. In this experiment, we ran a 60-nodes subnet in one datacenter under a load of 200x100KB requests per seconds, and we measured the consensus block rate. The graphs show the block rate over time with a line for each node. We added artificial link latency of 80ms round-trip to all network links in both experiments, to simulate a geographically-spread subnet.
The upper graph shows the block rate with the existing P2P layer. While the subnet successfully makes progress, the block rate is very shaky when the load is high. It returns to normal once the load is over, but during the time of high load, this would have meant a subnet-wide lower block rate and therefore higher user-perceived latency. The lower graph shows the robustness of the block rate with the newly proposed P2P layer. The subnet continues to run at a very steady block rate even under heavy load.
Conclusion
The new P2P layer for consensus and similar clients improves the performance of the Internet Computer, allows for better scalability, reduces code complexity, and improves the behavior under imperfect network conditions. It has been enabled on certain subnets already, only for HTTPS-outcalls related artifacts, and proposals to enable it for other clients will soon be submitted by the DFINITY Foundation. If accepted and executed, the entire P2P layer of the Internet Computer will shift to using QUIC instead of TCP, making the Internet Computer’s networking layer more robust, scalable, and performant.