Understanding Ethereum’s P2P Network
This article aims to help you understand P2P networks, and explains some of Ethereum’s implementation details. P2P technology has the potential to alleviate shortcomings of centralized systems by utilizing the rich resources of end devices, and since the 1990s has been adopted by popular softwares like eMule, bitTorrent, and Skype. It is also a key component of blockchain systems like bitcoin or Ethereum, the system the Shyft Network is derived from. Most people have heard about P2P, but do not know what exactly it is. Let’s start by talking about P2P networks in general.
What is a P2P Network?
A Peer-to-Peer (P2P) network is an overlay network — that is, it’s built on top of the public Internet. Mathematically, It can be viewed as a directed graph G = (V,E), where V is the set of peers in the network and E is the set of links between peers. Each peer p has a unique identification number pid. A link (p,q) in E means that p has a direct path to send a message to q; that is, p can send a message to q over the network using q’s pid as the destination. Although in the underlying TCP/IP network, similar IP addresses could translate to nearby physical locations, there is rarely such direct correlation.
Ideally, all peers should be connected by a path. Since individual peers have only an incomplete view of the network topology and peer membership, the overlay depends on intermediate peers to forward messages to the correct regions of the overlay. The graph structure provides multiple paths between every pair of peers, and contributes to resilience by enabling connectedness despite peer node changes. At each peer’s level, the connectivity of the graph is reflected in terms of its adjacencies to other peers. When peers join or leave the network, adjacent peers may have incorrect adjacency information. Overlay maintenance mechanisms are used to keep the adjacency information updated, thus maintaining connectedness across all nodes.
Participants in the P2P network make a portion of their resources available to other network participants. Each peer contributes compute cycles (CPU), disk storage, and network bandwidth, without the need for a central coordination instance. Peers are both suppliers and [consumers of network resources, in contrast to the traditional client-server model, where only servers supply and clients consume. Therefore, P2P networks have the potential to address the limitations of the client-server model, such as scalability and the single point of failure.
There is often a minimum resource contribution threshold for a peer to join the P2P overlay. Resource contribution should be fair. A fairness criterion can dictate that, for example, the average contribution of any peer should be within a statistical bound of the overall average of the P2P system. Resource contribution should also be mutually beneficial. Users are incentivized to participate in P2P applications if the benefit is comparable to the resources being contributed.
How does Ethereum’s P2P network work?
The official Ethereum client node software, Geth, implements its peer discovery protocol (the RLPx Node Discovery Protocol) based on an overlay maintenance mechanism called Kademlia DHT. While Kademlia is designed for efficiently locating and storing content in a P2P network, Ethereum’s P2P network is only used to discover new peers.
Kademlia
In the Ethereum network, each client node is associated with an enode ID, which is then hashed with SHA3 into a 256-bit value. Kademlia defines distance by the XOR metric, so the distance between two 256-bit numbers is their bitwise exclusive OR. Each peer has a data structure consisting of 256 distinct buckets, where bucket i stores information about 16 peers at distance 2^(i-1) to 2^i from its own enode ID. To discover a new peer, the Ethereum node chooses itself as target x, looks in its buckets to find 16 nodes closest to the target x, and ask them each to return 16 nodes from their buckets “closer” to the target x, resulting in up to 16x16 newly discovered nodes. From these 16x16 newly discovered nodes, 16 nodes closest to the target x are then asked to return 16 nodes even closer to x. The process continues iteratively until no new nodes are found.
Peer Communications
Geth uses UDP connection to exchange information about the P2P network. There are four types of UDP messages. A ping message solicits a pong message in return. This pair of messages is used to determine whether a neighboring node is responsive. A findnode message solicits a neighbors message that contains a list of 16 nodes that have been seen by the responding node. After establishing peer connections, Geth nodes exchange blockchain information via encrypted and authenticated TCP connections.
Data Structures
The Geth client stores information about other nodes in two data structures. The first is a long-term database called db, which is stored on disk and persists across client reboots. The db contains information about every node that the client has seen. Each db entry consists of a node ID, IP address, TCP port, UDP port, time of last ping sent to the node, time of last pong received from the node, and number of times the node failed to respond to a findnode message. If the time of last pong received from a node was older than 1 day, that node will be removed from the db.
The second data structure is a short-term database called table. The table is empty when the client reboots. The table consists of 256 buckets, each can hold up to 16 entries. Each entry records information about another Ethereum node — its node ID, IP address, TCP port, and UDP port. If a node fails to respond to findnode more than 4 times in a row, it will be removed from table.
When a client first starts, it has an empty db, and only knows about six hardcoded bootstrap nodes. Then, as the client starts to discover peers, it adds them to both db and table according to the mechanisms described above.
If you want to find out more about Ethereum P2P network, below are some helpful documents contributed by Ethereum community members:
- “RLPx Node Discovery Protocol” by Felix Lange, Gustav-Simmonsson, and Roman Mandeleil
- “Peer to Peer” by Felix Lange
- “Kademlia Peer Selection” by James Ray
References:
Vasilios Darlagiannis, (2010). P2P Systems and Overlay Networks, [PDF file] Retrieved from: https://www.iti.gr/iti/files/document/seminars/p2p_eketa_090610_v2.pdf
S. Umamaheswari and Dr. V. Leela, (2011, Mar.01). P2P Overlay Maintenance Algorithm, [PDF file] Retrieved from: http://journals.sagepub.com/doi/pdf/10.1260/1748-3018.6.3.555
***
Shyft is building the world’s first modern, secure, multi-stakeholder Blockchain-based trust network that enables attested data transfers. Join our Telegram (https://t.me/shyftnetwork), follow us on Twitter (https://twitter.com/shyftnetwork), GitHub (https://github.com/ShyftNetwork) and other channels found on https://www.shyft.network.