A Detailed Description of P2P Network Modules

Published in

TRON

14 min readMar 20, 2020

1. Background

Blockchain nowadays can be sorted into three categories according to its access mechanism: public chain, consortium blockchain, and private chain. Oftentimes, blockchains are compared to ledgers, and their access mechanism can be understood as the transparency of the ledgers. Among these, public chain has the most transparency, meaning it’s more advanced in terms of decentralization and more in line with TRON’s mission: Re-decentralize the web. Therefore, TRON has chosen public chain as its platform, for public chain means that any organizations or individuals are able to initiate or look up transactions as well as participate in the consensus mechanism. It is a blockchain of transparency.

Decentralization is one of the essential attributes of a public blockchain. In order to get rid of centralized control, a P2P network without a central server becomes the favorable solution here. In a P2P network, all nodes have the same status — together, they govern and maintain a public blockchain.

P2P network module is an underlying part of TRON’s architecture. This article will provide a detailed illustration on how P2P is implemented in TRON’s network including the rules of node discovery and of connection establishment.

2. Introduction

The public chain is governed by all nodes in a P2P network. Each node needs to seek out the other peers in the network and choose trusted nodes to build a local routing table by discovering unknown nodes in the network. Once the table is established, nodes will be able to use their own filtering rules as a defense mechanism and select the reliable nodes to build connections.

3. Kademlia algorithm

To construct a blockchain system using P2P network, there are a couple of questions to consider:

• How to build a topographic architecture for the network?

• How to look up specific nodes in a network?

• How to handle the dynamic changes of the nodes?

• How can we evaluate the credibility of the nodes?

Based on the above considerations, TRON chose Kademlia algorithm, which specifies the logical structure of the routing table and conducts information exchange through node discovery. The article will elucidate how TRON implement Kademlia, and the above-mentioned questions will be answered as the article goes on.

3.1 Introduction

Kademlia is one of the many versions of DHT (Distributed Hash Table). Compared with previous versions, Kademlia has many advantages such as:

• It reduces the number of messages required for nodes to communicate

• Reduce latency caused by parallel nodes and asynchronous queries

In summation, it makes communication more efficient.

The Kademlia network is characterized by three constants: alpha, 𝐵, and 𝑘. alpha represents parallel numbers, which is usually 3; 𝐵 indicates the length of node ID in binary bits; 𝑘 represents the capacity of the bucket. A bucket is a unique structure of Kademlia, which will be described in the following paragraphs.

3.2 Nodes

A distributed P2P network consists of many nodes. In a Kademlia network, each node has a unique ID generated by random encryption algorithm, replacing IP as the node’s identifier. In addition, the distance between two nodes can be calculated by performing XOR operation on two nodes, and that is how other nodes are located.

In the implementation, B = 512 , the length of the node ID is 512 bits (64 bytes).

Each node possesses the following properties:

Here some questions may rise: Why would there be nodes that do not really exist? What is the meaning of this field?

A Kademlia network can be indicated with a binary tree. The height of the tree equals the length of the node ID, and each leaf represents a Kademlia node. Thus, when B = 512, the tree can host a maximum of 2⁵¹¹ Kademlia nodes. In reality, the number of nodes will not be tremendous. In order to look for other nodes, the algorithm will generate an ID that complies with all rules (the said ID will be known as target). This ID may not have a real node under it, but whether real or not, it serves as an important aid in terms of expanding the routing table.

Details on expanding the routing table will be further explained below.

3.3 Distance

Kademlia calculates the distance between two nodes using the XOR operation. “Distance” here refers to the logical distance computed using a mathematical method rather than the physical distance (e.g. two nodes in China and the US are close in distance in a Kademlia network if assigned similar IDs).

Now, there are two node IDs, A and B. For ease of illustration, let’s assume the length of them is 4 binary bits (0001 and 0010). The steps to compute the distance between the two are as follows:

Perform A ⊕ B, and get the result: dis (dis = 0011)
Subtract the number of consecutive 0s following the most significant bit from 4 (e.g. two consecutive 0s will make 4 -2 = 2 )

We can draw the following conclusion from the above calculation: the more bits match at the beginning of the two IDs, the smaller is the distance between them (a node is closest in distance to itself because the according output from XOR operation equals 0).

Computing the logical distance is meaningful. To give a real-world example: assume in an office, A wants to have colleague M’s contact information and asks colleague B for it, who he thinks is most likely to know M. Unfortunately, B doesn’t have M’s contact info either, so he refers colleague C to A, who he thinks is most probable to know M. In the end, A obtains the information he wants from colleague C.

We can draw an analogy between logical distance and interpersonal distance from the above illustration. One thing to note is that A also obtains B and C’s information alongside M’s, making three friends, B, C, and M.

In actual node discovery, the letters aforementioned each represent a node. M may be non-existent, but it doesn’t matter as B and C must be real nodes. Upon completing a node discovery, A obtains B and C’s information, and through which it expands its routing table.

Kademlia adopts the XOR operation to calculate distance based on the following three considerations:

the distance between a node and itself is zero
it is symmetric: the “distances” calculated from A to B and from B to A are the same
it follows the triangle inequality: given A, B and C are vertices (points) of a triangle, then the distance from A to B is shorter than (or equal to) the sum of the distance from A to C plus the distance from C to B. dist(A, C) ≤ dist(A, B) + dist(B, C)

The above three features qualify the XOR operation as an easy and convenient method to calculate the distance between nodes.

In the implementation, distance is computed as: 256 - number of consecutive 0s following the most significant bit in two IDs’ output from XOR operation.

3.4 Routing Table

In the previous section, we exemplified the rule of node inquiry using an example of interpersonal relationships. In the example, A ends up building his own network of relationships, which, for a node, is its own routing table.

Essentially, building a routing table is creating a map of the overall network. According to the aforementioned example, neighbor discovery is a process where nodes constantly converge towards the target. Given a random target node X, the node N will gradually move towards X by constantly obtaining information from nodes that are closer to X. In this process, N’s routing table keeps on expanding even if X does not exist.

The specific implementation of the Kademlia routing table can be adjusted based on the actual situation. In our implementation, each node maintains a fixed-sized routing table that consists of n lists. According to the Kademlia paper, these lists are named “k-buckets”, with k referring to the maximum entries that can be stored in each list (e.g. k=16).

In the implementation, for list m( 0 < m < 256), the distance between the local node and all nodes stored in this list equals m; for nodes stored in list 0, the distance between them and the local node is , and we have -256 ≤ dis ≤ 0

Based on the discussion above, bucket 0 seems more likely to be full as it needs to store nodes covering a wider range, but the actual case is different. The reason for this will be explained below.

In the implementation, information of nodes is stored in k bucket in the form of NodeEntry.

For ease of illustration, we assume each node ID has three bits and there is a maximum of 2³ nodes, as shown in the figure below:

local node with ID=110 as an example. There are four nodes with 0 matched bit at the beginning (001, which should be next to 010, is not shown in the figure), two with 1 matched bit and one with 2 matched bits.

This shows that with a randomly-generated ID, there is a much bigger chance that the distance between the local node to any node on the network is greater rather than smaller. If we denote the number of matched bits at the beginning as x, then the probability is 2⁻ˣ⁻¹.

It can be inferred from the above binary tree that the greater the bucket’s number is, the bigger the chance of a node falling into it and hence it’s more likely to be filled. Considering the actual situation, each node maintains 256 buckets, among them, buckets No. 1 to No. 255 store nodes with a distance of 1 to 255 respectively; bucket No. 0 stores nodes whose distance is no greater than zero other than itself.

Methods for addressing full buckets will be explained in “Node State” below.

The parameters of Kademlia can be customized based on actual needs. Current parameters of Kademlia are as follows:

The neighbor discovery process is as follows:

• Randomly generate a target ID (this ID may not be assigned to a real node)

• Search its own routing table and obtain information from the node that is close to the target. It will refresh its routing table if a node closer to the target is identified.

• Regenerate a target ID and repeat the above steps after entering the maximum number of rounds.

Each node stores information from a fraction of nodes on the whole network. While approaching the target, the local node obtains more information from other nodes through constant neighbor discovery, thus expanding its routing table. To ensure the availability of its neighboring network and the stable operation of the whole network in a broader sense, each node chooses to store information from nodes with a longer uptime in its routing table.

In the implementation, trusted seed nodes will be first included in the routing table when they start as the guide to node discovery.

3.5 Protocol Messages

Protocol messages are used for node communication. If we see communication as dialogues, then protocol messages can be compared to Q&A pairs in implementation.

There are four types of messages in node discovery:

For more information, please see Update Routing Table.

3.6 Node state

Protocol messages and node states are used to address the issue where node state changes dynamically. In node discovery, a node entails the following states:

For more information, please see “Update Routing Table”.

3.7 Update Routing Table

Two ways to update the routing table: add a new node or replace the existing node with a new one¹. Such a replacement can help maintain node activity in the routing table. Each node communicates with its counterparts through protocol messages and modifies node state based on the feedback it receives.

How is the routing table updated:

Read the preset seed node and ping it. If the seed node is alive (the node replies with a pong message), add it to the routing table and set the state as active.
Solicit neighbors from nodes in the routing table.
Send a ping message to the newly-discovered node. If the node returns a pong message, it is alive. Then set its status as active. Otherwise, set it as dead.
Try to add nodes that are alive into the routing table based on their distance (into the corresponding k-bucket).

If the k-bucket is not full, add the node directly and set it as active.
If the bucket is full, find the node with the smallest modified value (modified refers to the last time the node pinged successfully. The smaller the modified value, the less likely the node is active) and challenge it. Then change the state of the challenged node from active to evictcandidate.

Send ping messages to evictcandidate nodes.

If the evictcandidate node returns a pong message, it is alive. Then reset its state as active and change the modified value to the current time; make no changes to nodes that are alive.
If the evictcandidate node does not return a pong message, it is offline. Then set its state as nonactive and remove it from the routing table; add nodes that are alive into the routing table and change their state to active.

According to the above steps, new nodes will join only when there are offline nodes in the routing table.

That is the whole procedure for node discovery.

4. Build Connection

Since each node only has a limited number of slots for connection, a scoring system is introduced to rate and rank nodes in the routing table in order to help connect high-quality nodes. Whether a node is categorized as “High-quality” is determined based on many metrics including connection stability and value of the information stored by other nodes.

4.1 Node Score

Node score is intended to help locate relatively stable nodes and build connections with them. A higher score suggests greater stability. Here are the score metrics:

Penalty will be first considered when scoring a node. A node can be penalized for the following reasons:

It has not been 60 seconds since the last time the node was disconnected.
P2P version, port or other information of the nodes do not match.
Block information of the nodes do not match (such as genesis block or solidity blocks)

Here handshake is not the 3-way TCP handshake. It refers to the process during which the node verifies whether the node it has built connection with is on the same chain.

4.2 Node Filtering

Node filtering examines how reliable the node to be connected is, such as whether the connection is stable and whether it is a malicious node.

When selecting nodes for connection, nodes pre-declared in configuration files will be first considered.

These are two types of trusted nodes. It should be noted that scoring is only one of the filtering rules. While trusted notes can skip scoring and be first considered to build connection with, they still need to meet other filtering criteria to qualify for connection.

The node filtering process is as follows:

Determine if the node to be connected is itself
Determine if the local connection limit is reached (can be specified in configuration files)
Determine if the connection is already established.
Determine if connections from the same IP have reached the limit, including connections sent to the same IP. The purpose is to fend off attacks.
Choose the highest-graded node according to the scoring rule.

When passively accepting connections from other nodes, there is no need to determine if the connection comes from the node itself. If an identical request is received, the newly-built connection will be discarded to retain the original connection.

4.2.2 Filtering rules and eclipse attack

Eclipse attack is an attack targeted at P2P networks. Since a node can only keep a limited number of TCP connections, if the attacker hijacks all available connections, the victim will lose access to true information as it only receives information (usually malicious) from the attacker rather than normal nodes.

A blockchain node no longer functions upon receiving a large amount of false information. An excess of victim nodes will compromise the entire blockchain network.

After taking such risk into consideration, we have taken some counter-measures in implementation:

Among the fixed peak number of connections, some of them must be initiated by the node itself (instead of passively accepting all connections).
A node can build a maximum of two connections with the same IP.

Starting multiple nodes on one device will be deemed an attack. That’s why there is a limit to the number of connections with the same IP. Meanwhile, as a node is started, it will first connect the preset trusted nodes (proactively build connections with active nodes and give priority to connection requests from passive nodes). Node filtering makes it harder for attackers to get their way.

4.3 Handshake

Handshake happens after TCP connection with other nodes is built and before sync starts. The purpose is to determine if the nodes share the same block information. There is no use interacting with a node storing different block information.

Both parties in the handshake will send HelloMessage to each other, which includes the following information:

Basic node information
Current time
Genesis block ID
Solidity node ID
Block header ID

The connection will be cut off when the two parties verify their HelloMessage in the following cases:

Different P2P versions
Different genesis block
Local solidity nodes have a larger block height than the other party and the solidity nodes of the other party are not recorded on the local blockchain.

A successful handshake marks the completion of nodes connection.

5 Concluding Remarks

This article introduces TRON’s P2P network. Co-governance of a distributed P2P network lays the foundation for decentralization, an important feature of blockchain. A thorough understanding of the P2P network, the cornerstone of blockchain, takes us to the world of blockchain faster.

6 References

https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf

7 For more information

Github: https://github.com/tronprotocol

Telegram: https://t.me/troncoredevscommunity

[1] The extent to which a node is new or old is determined by its modified value. When a ping is sent to an existing node in the routing table and a pong is returned successfully, the modified value of the node will be changed to the time when pong is received (in the form of a timestamp). The larger the modified value, the newer the node.