Technological breakthrough | Adaptive dynamic networking method to improve the efficiency of blockchain consensus
On April 15th, Ultrain successfully completed the release of the “Adaptive Dynamic Networking” function according to the planning of the 2020 technology road map.
https://github.com/ultrain-os/ultrain-core-production/tree/4.0.0
One of our technical goals this year is to hope that through blockchain technology, we will focus on empowering fully competitive industries and helping them to establish low-cost mutual trust franchise business models.
We envisage such a business scenario, a provincial catering alliance with 1,500 franchised stores with more than 20 catering brands in the province. They hope to achieve the exchange of catering consumption points within the alliance so that consumers can redeem within the brand points can be exchanged for use in more than twenty brands. In the traditional business environment, this kind of assumption is difficult to establish, because the points platform is generally built and controlled by the leader. Under traditional technical means, even if the leader shows how open and transparent, it is difficult for the affiliated companies to trust The platform does not violate the rules and secretly issues additional points, because this is an act that can directly bring benefits to the leader, and it is difficult for the leader to prove himself innocent here.
We can solve this problem with Ultrain blockchain technology. We can deploy 1500 nodes in 1500 franchise stores. Each franchise store provides its own computer and installs Ultrain blockchain software on the computer. The computers are organized into an Ultrain blockchain network, and the behavior data such as the issuance, circulation, exchange, and consumption of points will be directly recorded in the Ultrain blockchain network, and the data will be synchronized to these 1500 computers at the same time. Modify or delete the relevant data of these points, in this way, all franchise stores can trust this set of points system and join the operation.
If blockchain technology can support this business scenario, we have multiple technical points to break through. For example, the computer is controlled by each store, some computers may run for a long time, some computers may only be turned on when the store is at work, and will be turned off after work, and the work hours of each store are different, so the computer is turned on and off The time will also be different; at the same time, because it is running on the public network, the network status will be good and bad, and there may be situations where some computers suddenly deteriorate the network status and cannot connect to the network. Therefore, we are faced with the situation of a dynamic computer network. There are computers joining and exiting at any time in this network, and the number of online computers is not fixed.
In the traditional blockchain BFT consensus protocol, if more than 1/3 of the computers in the network are offline and not in the network, the entire blockchain network will be paralyzed. However, in the above business scenario, this situation is likely to occur. Ultrain’s “adaptive dynamic networking” technology effectively solves this problem. In the process of the gradual decrease of the number of nodes in the Ultrain blockchain network due to various reasons, the Ultrain network can dynamically adapt to this process. Even in the case where more than 1/3 of the computers in the network are offline, the Ultrain network can be kept Long-term stable operation. After the disconnected node comes back online, the Ultrain network will also automatically connect these “resurrected” nodes to the network and let it continue to provide services.
Now let’s introduce how the “Adaptive Dynamic Networking” function of Ultrain is designed technically.
Blockchain is a special distributed database. The system generates accounting nodes in committee nodes according to a certain consensus algorithm, records and modifies information on the chain, and synchronizes to all committees. The consensus algorithm based on BFT has the following two problems:
1. With a certain probability, the members of the committee that are not online are selected as the accounting nodes, and effective blocks cannot be produced. This will lead to empty blocks in the current accounting process.
2. When a large number of committee members gradually go offline, the consensus algorithm of BFT cannot be completed, and the system will be stuck in the current round and cannot continue. The entire blockchain network is paralyzed.
Ultrain proposes a solution based on discrete heartbeat signals for two issues of consensus. Each monitored node is set with a separate monitoring time to prevent startling effects.
Adaptive dynamic networking includes the following steps:
1) The number of all effective committee members in the blockchain is m, and the BFT consensus algorithm will generate a data block on the blockchain within each block generation period t;
2) To deploy a heartbeat monitoring smart contract on the blockchain, all valid committee members in the blockchain must send the sign-in transaction in the heartbeat monitoring smart contract within the specified time interval T, complete the sign-in on the chain, and then extract nodes for inspection Remove valid committee members who have not checked in.
Setting different checkpoints for each group (one) node can effectively prevent the instantaneous system paralysis caused by the shock group effect.
Extract nodes for inspection, and remove valid committee members who have not checked in, including:
A) Divide the interval time T into n time nodes (slots, hereinafter referred to as checkpoints), and the interval of each checkpoint is T / n;
B) Based on the association of the nodes, divide the nodes into several sets;
C) A node is randomly selected from each valid set for inspection to form a sampling set for this checkpoint;
D) No sign-in will be removed from the effective committee;
E) For the set with abnormal nodes, the sampling weight α will be increased at the next checkpoint;
In step A), n is determined by the correlation degree δ, and the range of the correlation degree δ is 0 ~ 1.
In step B), based on the association of the nodes, the nodes are divided into several sets, including:
a) Divide all nodes into 24 sets Ci, where 0≤i≤23;
For blockchain nodes, whether they can provide computing power normally is closely related to time. Therefore, we use the time zone division method to first divide all nodes into 24 sets, set Ci (0≤i≤23);
b) The set with no more than Min nodes is unified to the left, where Min is set to 3% of the total number of nodes;
For example, if the number of Ci nodes in the set is less than Min, then they are grouped into the Ci-1 set;
c) Delete the empty collection generated after the collection to get a valid collection;
d) A two-dimensional feature attribute is used to describe the node (X, Y), where X is the number of the time zone to which it belongs, and Y is the number of historical blocks of the node;
e) Select
centroids for each effective set,
where N (Ci) is the number of nodes in the set Ci;
f) Nodes (X, Y) and
centroids use K-Means algorithm to find K node clusters, among which,
complete the node division into several sets;
Using the K-Means algorithm, (X, Y) can be set according to specific network characteristics. For example, for a time-sensitive network, the X weight can be set to 99%, and the Y weight can be set to 1%.
In step E), the set of nodes where the problem is checked at this checkpoint, according to their relevance, the probability that there are still abnormal nodes becomes higher, so in the next sampling, the weight α is increased.
In this scheme of Ultrain, the h abnormal committee nodes are screened in time by the heartbeat method (contract) and the committee is removed, then the number of effective committees becomes m ’= m-h.
Each round of consensus selects the accounting node only from the effective committee, which can effectively avoid the offline committee.
At the same time, the scheme will temporarily remove the abnormal node from the committee. Even if this part of the node participates in the consensus, it will be treated as an invalid node, which will prevent the occurrence of evil.
When the committee comes back online and sends the heartbeat method, the system adds it back to the list of alternative committees. Can continue to participate in consensus.
This solution of Ultrain divides the online status of the system committee into 5 categories:
1. Low risk, less than 5% of offline nodes;
2. Medium and low risk state, 5% -15% of offline nodes;
3. Medium risk state, offline nodes 15% -30%;
4. High-risk state, offline nodes 30% -33%;
5. When the downtime state exceeds 1/3, the node is not online.
Compared with the existing technology, this solution of Ultrain has the following advantages:
The common heartbeat monitoring in the prior art generally has a uniform monitoring period T. When the time is up, it detects which nodes have not sent heartbeat messages in time. This will bring a potential risk, that is, the nodes that exceed the threshold at a certain period are abnormal at the same time. This phenomenon will bypass the heartbeat monitoring system and bring a catastrophic blow to the system. The probability of this situation is proportional to the monitoring period T. In this scheme of Ultrain, setting different checkpoints for each group (one) node can effectively prevent the instant system paralysis caused by the shock group effect.
Next, we quantitatively prove that this scheme is significantly better than existing heartbeat monitoring.
Specifically, assuming that the total node set M = 96, the nodes are independent of each other, and the probability of abnormal occurrence of each node p = 10%, and the probability of simultaneous abnormality of nodes exceeding f = 1/3 P = p ^ (m * f) Is very low.
But complete independence is theoretical, and in reality, many nodes have relevance. On the network, some habits of the operator. For example, some nodes will be turned off at night. This greatly increases the instability of the blockchain.
We further simplify the model, forming M = 96 according to the step k-means algorithm to form a K group, the nodes in the group are related, the correlation coefficient δ = 1 (that is, the degree of association δ = 1), and the groups are independent of each other.
We assume that at a certain time, multiple nodes of the system are down, the system enters a high-risk state, and will be down at any time, and the probability of empty blocks is close to 1/3, and the system is in a low availability state. Risk state time to prove the advantages of this program.
According to our definition of the node group model above, suppose that the nodes of the two groups are down, the set B of the downtime, any bi∈B is not online, | B | = 32, B∈M we need to detect more than y = 5 nodes to get the system out of high-risk areas.
We divide the period T into M / K checkpoints. At each checkpoint, a node is randomly selected from the K-node cluster to check the punching of the heartbeat method.
At the first checkpoint, a node is randomly selected from the K node clusters obtained by the k-means algorithm in step 2 to form a set H, where H ∈ m if | H∩B | ≥5, it is set H and B The number of intersections exceeds 5, which means that at the first checkpoint, more than 5 outage nodes are detected, making the system out of the high-risk area. The probability of this event:
Among them, P (| H∩B | ≥5) represents the probability that the number of intersections of sets H and B exceeds 5.
represents the combined type of k nodes extracted from the total set M,
means that from the set B, that is, the set of faulty nodes, the type of the i elements is taken from the elements (0≤i≤4),
k-i node combination types are extracted from the M-B set, and the probability that more than 5 outage nodes can be detected before the nth (1≤n≤6) checkpoint is determined by the recursive method, as shown in Table 1:
From the data analysis, it can be seen that the system recovers from the high-risk state with a probability of 67.9% at T / 6, while the probability of the system recovering from the high-risk state at T / 3 exceeds 99.9%, which is much better than the fixed time monitoring algorithm.
Using time-shared random monitoring, the expected time to recover from a high-risk state changes from T to 0.22T.
Obviously, if we can divide T into more time periods, the time can be expected to move more, as shown in Figure 1.
It can be seen that the expected recovery time does decrease with the increase of detection density, and different parameters can be set according to different system requirements.
After using this scheme, the blockchain consensus algorithm will be more robust.
When some nodes begin to go offline gradually, they will stop sending heartbeat transactions, and the blockchain system will sense this change in time and strip the abnormal committee members from the committee.
BFT’s consensus algorithm is fault-tolerant for a small number of committee members offline. But when the offline node exceeds 1/3 of the committee, the system cannot reach the consensus of BFT, and then becomes paralyzed. This solution will promptly remove the invalid committee members from the committee to repair the BFT system, so that the entire consensus system can continue to operate effectively.