Synchronization between nodes in a distributed system forming a blockchain

Published in

GAME

18 min readJul 26, 2018

Distributed system is defined by Tanenbaum that “A distributed system is a set of independent computers that appear as a single, coherent system for users” in “DISTRIBUTED SYSTEMS — Principles and Paradigms”.

Andrew S. Tanenbaum and Maaetem an Steen : DISTRIBUTED SYSTEMS — Principles and Paradigms Second Edition, Pearson Pretice Hall (2007)

Blockchain, by constructing a worldwide distributed system, attempt to realize decentralized new data store and organization structure.

In the first place, the reasons for orienting to distributed systems are mainly scalability, locality, and availability. Blockchain is not exception. Geographical scalability to form a global value store network / Locality for information protection including tamper resistance under non-centralized structure / Availability with zero downtime. These futures are all realized in blockchian using distributed system.

MOLD develops its own chain, but in constructing a distributed system, it is indispensable to understand what is a distributed system in the first place and what is necessary for normal and high performance operation. I would like to describe what kind of distributed system originally is so that realize MOLD’s non-centralized virtual space.

0.Table of Contents

X. Block chain and distributed system
1. Introduction (Overview of synchronization and overall flow)
2. Clock synchronization
2–1.Physical clock（Clock and clock skew）
2–2. Clock Synchronization Algorithm（Network Time Protocol (NTP)/
Berkeley algorithm）
3. Logical Clock
3–1. Lamport’s Logical clock（Totally-ordered multicast）
3–2. Vector clock（Causality order multicast）
4. Exclusive control
4–1. Centralized Algorithm
4–2. Decentralized algorithm
4–3. Distributed Algorithm
5. Election Algorithm
5–1. Bully Algorithm
5–2. Ring Algorithm
6. Block chains and synchronization as distributed systems
6–1. Block chain and clock synchronization（Block chain and pysical / logic clock）
6–2. Block chain and exclusive control algorithm（Exclusive control algorithm in PoW · PoS・BFT）
6–3. Block chain and leader election algorithm（Leader selection algorithm in PoW · PoS・BFT）

1. Introduction (Overview of synchronization and overall flow)

Unlike a centralized system, it is not easy to get an agreement about time in a distributed system.
In the former case, it is possible to determine the absolute order relation based on the globally shared clock, but in the latter case, it is difficult to share absolute time because there are clock value errors and correspondence time .

However, the order in absolute time is not absolutely necessary, and it is often that it is enough if the relative order is fixed. How can we synchronize all the clocks in a distributed system? Also, what algorithms are needed to synchronize in a distributed system with consistency?

In this article, the synchronization between nodes will be explained in the following order.

How does clock synchronization occur?
Relative ordering method using logic clock and vector clock
About exclusion control algorithm for consistency in distributed system
About leader election algorithm in distributed system

2. Clock synchronization

2–1. Physical clock

Clock and clock skew

Most computers have circuitry to hold time, this device is called “clock”. This is based on vibration at frequencies that can clearly defined by the type of crystal, the cutting method, and the magnitude of pressure when adding tension to precisely machined quartz.
Although this frequency is reasonably stable, there is no guarantee that all the crystals of different computers will operate at exactly the same frequency. The difference in the time of synchronization caused by this is called clock skew.

Under this situation, especially in a real-time system, how to synchronize multiple clocks with the real world clock and how to synchronize the clocks is a problem.

The time in the real world was originally based on the mean solar second, but now the time for cesium 133 to transition 9,192,631,770 times is defined as one second, and the International Atomic Time and Universal Coordinated Time (UTC) is defined. In order to provide UTC to those who need accurate time, WWV is used and time is delivered with accuracy of ± 10 msec.

2–2. Clock Synchronization Algorithm

However, most machines do not have WWV receivers. For that reason, each machine needs time-tracking and management algorithms so that all machines can sync time as well as possible.

Incidentally, an error for determining whether resynchronization is necessary, that is, clock skew, is measured as follows.

Define H as the number of interrupts per second (tick number) due to vibration of the crystal counted by each machine, and let C be the value of this clock. Let Cp(t) denote the clock value of machine pwhen UTC time is t.
If Define ρ as the maximum drift rate that defines how much clock skew is allowed, it is assumed that it is operating within the following range.

1-ρ <= dC/dt <= 1+ρ

That is, after Δt seconds have elapsed from the previous synchronization, two clocks are separated by 2ρΔt at the maximum.

When guaranteeing that there is no deviation larger than δ at the time of operation execution, it is necessary to resynchronize the software at least every δ / 2ρ.

Network Time Protocol (NTP)

It is common in many protocols, the method first proposed by [Cristian, 1989] is a method of communicating clients with time servers. Since the time server has a WWV receiver or has an accurate clock, it can provide the current time. When communicating with the server, it is important to delay the time when message propagation delays are reported, but by estimating the delay, here it is possible to minimize the error. Currently, NTP is known to be able to achieve precision in the range of 1 to 50 milliseconds.

Berkeley algorithm

In many algorithms such as NTP, time servers are passive and only answer inquiries. On the other hand, in the Berrkeley algorithm, the time server receives the time held by each participating node and also changes its own time based on the average. When the time value does not have to have a relationship with the real world, it is easy to agree at the same current time and it is effective with this algorithm.

3. Logical Clock

Up to this point, although we described a method of synchronizing clocks with absolute time as reference in accordance with the real-world clock, it is often enough to only perform relative synchronization. Here, the concept of logic clock is used to determine the relative order.

3–1. Lamport’s Logical clock

In order to synchronize the logic clock, Lamport defined a relationship called happens-before. The expression a → b means that “a occurs before b”, it means that event a first occurs, then all processes agree that event b will occur. Happens-before relationships can be observed directly in the following two situations.

If a and b are events in the same process and a occurs before b, then a → b is true.
If a is an event of a message sent by one process and b is an event of that message received by another process then a → b is also true. A message can not be received before it is sent, and even if it is at the same time it takes a finite nonzero time.

Because the pre-occurrence relationship is in a transition relation, if a → b and b → c, a → c can be proved. If events x, y occur in different processes that do not exchange messages, neither x → y nor y → x are true and these events are said to be concurrent. (The happens-before relation is unknown.)

With the logic clock, relative time is measured by allocating the time C(a) which all processes agree for each event a. If these time values are a → b, they are corrected by adding a positive value to the time so that C (a) <C (b). By assigning time values as shown in the following figure, it is possible to grasp the happens-before relations.

http://sergeiturukin.com/2017/06/26/hybrid-logical-clocks.html

In Lamport’s logic clock, if a → b, it can be proved that C (a) <C (b), but if C (a) <C (b) then a → b does not necessarily hold. In other words, a → b is a necessary condition of C (a) <C (b) and is not a sufficient condition. Improvement is added to the logic clock of Lamport, and it is a vector clock that makes this Necessary and Sufficient condition.

Totally-ordered multicast

Details are described in the article on “Distributed system consistency”, but
in many cases, it is necessary to perform totally-ordered multicast between duplicated replicas. In other words, all messages are needed to be passed to each recipient in the same order. Lamport’s logic clock can be used to implement totally-ordered multicast under fully distributed systems.

When a process receives a certain message, it is placed in the local queue in the order according to the timestamp. The recipient multicasts an acknowledgment to another process. If you follow Lamport’s algorithm to adjust the local clock, all processes will actually have the same copy of the local queue. There is a process that can pass messages in a queue to a running application only if the message is at the head of the queue and is acknowledged by all other processes, therefore, all the messages are delivered in the same order everywhere. In other words, totally-ordered multicast has been established.

3–2. Vector clock

With vector clock, it is possible to grasp the causality which could not be grasped by the Lamport’s logical clock. Assuming that the vector clock of event a is VC (a), the following steps are executed so that a → b becomes a Necessary and Sufficient condition of VC (a) <VC (b).

Node Pi adds 1 to vector clock VCi [i] before sending a message over the network, or operate some internal event.
If process Pi sends message m to Pj, Pi sets the vector time stamp ts(m) of m to be equal to VCi after executing the preceding step.
When message m is received, process Pj executes step 1, distributes the message to the application, and after that update each k of its own vector clock as follow VCj [k] ← max { VCj [k], ts(m)[k]}.

http://sergeiturukin.com/2017/06/26/hybrid-logical-clocks.html

Causally-ordered multicast

By using the vector clock, it is possible to realize a causally-ordered multicast which is somewhat weaker than the above-described totally-ordered multicast .

By comparing the values of vector clocks and grasping the happens-before relation, for a certain event x, other events can be classified into past events, concurrent events, and future events. For example, in the above figure, when the event d is used as a reference point, the past events are a, b, c, i, the concurrent events are j, l, m and future events are f, g, h, Become.

At this time, causally-ordered multicast is assumed to be a sequence of past event and causal event in which all causal relationships occur, in order to be consistent in all processes, but order in regard to concurrent events is irrelevant. In this way, it is different from the Lamport’s logic clock that the causality can be grasped with the vector clock.

4. Exclusive control

Concurrent operation and cooperative operation among a plurality of processes are basic in a distributed system, but in order to guarantee exclusive access to resources so as not to be in an inconsistent state when accessing the same resource at the same time by a plurality of processes, A distributed exclusive algorithm is required.

Distributed exclusive control algorithms can be categorized into the following two types.

Token-based solution
Permission-based method

In the token-based scheme, it is easy to avoid Starvation (access to resources is not permitted for a very long time）and Deadlock (multiple processes wait for each other’s progress). A representative example is a token ring algorithm. However, when the process holding the token abnormally stops and the token is lost, it is necessary to generate only one new token, and this complication is serious as a disadvantage.

Many other decentralized exclusive control algorithms adopt the permission-based method, and there are many different ways to obtain permission, so we will explain it concretely.

4–1. Centralized algorithm

By simulating what is done with a single processor system, single access by exclusive control in the distributed system can be achieved easily. In the centralized algorithm, one process is appointed as a coordinator, and when a process accesses a shared resource, a request message is sent to the coordinator for permission. If the other process has not accessed the shared resource, the coordinator returns a permission response, and after receiving the reply, the requested process executes the process.
It is easy to see that this algorithm guarantees exclusive access to resources, but it has a serious disadvantage of Single Point of Failure. Although this can be a performance bottleneck in large systems, the advantages coming from that simplicity still compensate for the drawbacks.

4–2. Decentralized algorithm

Assume that each is duplicated n times. In the decentralized algorithm, when a process accesses a resource, approval of a majority of m> n/2 is required. If a majority of approvals are obtained, the process gains permission and can carry out processing.
Although this scheme solves the single point of failure problem of the centralized algorithm, if there are too many nodes trying to access, there is anbother problem that performance can not be obtained because no node can obtain sufficient votes.

4–3. Distributed Algorithm

In this algorithm, it is assumed that the order of all events on the system can be defined as a totally-ordered relation. As this base, Lamport’s logical clock described in the previous chapter is used. Also assume that none of the messages will be lost.

When a process tries to access a shared resource, it creates a message containing the name of the resource, its own process number, and the current logical clock, and sends it to all other processes. When this request message is received, the following operation is performed according to its own state.

If the recipient is not accessing the resource and is not trying to access it, the recipient returns an OK message to the sender.
If the recipient is already accessing the resource, do not reply and queue the request.
If the recipient is trying to access the resource and has not done it yet, compare the timestamp in the input message with the timestamp in the message you sent to the other process and treat the lower one as the winner. If the received message has a small time stamp, the recipient returns an OK message. If the own message has a smaller time stamp, the recipient will not queue the input message back.

Obviously this algorithm works normally if it does not conflict like process1 or 2. Even in the case of conflicts, the condition that only the only process can gain access is established.

As with the centralized algorithm, this algorithm can guarantee exclusive control without deadlock or starvation. Furthermore, there is no single point of failure. Nonetheless, the single point of failure is replaced by the failure n location characteristic. It can be solved by replying permission or denying permission and introducing timeout, but other problems also arise such as requiring multicast communication primitives. Unfortunately at the present time, a distributed algorithm that surpasses the centralized algorithm has not been devised and is still in the middle of research.

When comparing the respective algorithms, it becomes as follows.

5. Leader Election Algorithm

Many distributed algorithms require a special process that has a role like a leader as a coordinator or initiator. Which process is the leader and whether the only process can be the leader is an important issue and researchers have been working over the past few decades.

5–1. Bully algorithm

When the coordinator fails and any process P notices about that, P activates election according to the following procedure.

P sends an ELECTION message to all processes that have a higher numerical value than itself.
If there is no response from anyone, P will win the election and become a coordinator.
If there is an answer from a process with a higher numerical value than P, it will be replaced. P’s work is over.

With this algorithm, the coordinator can be uniquely determined. However, this algorithm requires a large number of messages and data traffic and can be said to be redundant. As an alternative, there is a ring algorithm.

5–2. Ring algorithm

This algorithm does not use tokens unlike general ring algorithms. Any process that finds that the coordinator is not work constructs an ELECTION message containing its own process number and sends that message to its successor (the next node in the ring network). Skip if successor is down. If there is no node with a higher numerical value than you, your message will be returned to yourself still with your process number, so it will be appointed as the coordinator.

In this algorithm, a leader election with a reduced number of messages is performed, but it is also possible to realize an algorithm with a smaller amount of data traffic by setting the destination of the message to both neighboring nodes.

6. Block chains and synchronization as distributed systems

So, in the block chain, which is one of the distributed systems, how does synchronization between processes occur?

6–1. Blockchain and clock synchronization

Block chain and logic clock
First, think whether we can grasp the absolute time relationship using the physical clock in the blockchain. As mentioned in Chapter 2, each node participating in the network does not always hold the correct physical clock, and clock skew should exist. Since the average generation time of the bitcoin blockchain is 10 minutes, it is considered that even a certain degree of large clock skew is acceptable. However, it is difficult to synchronize the respective physical clocks while the nodes are scattered around the world, and it is also possible that there are nodes that camouflage the clock. Resynchronizing the correct time between nodes by introducing Network Time Protocol (NTP) is a difficult technique.

Block chain and logic clock

Therefore, it is realistic to prepare a logic clock instead of a physical clock. Actually, by incorporating the time stamp in the block, a mechanism very similar to the Lamport’s logic clock is prepared.

As described in [Bitcoin: A Peer-to-Peer Electronic Cash System, Satoshi Nakamoto], each node that performs a write operation to a block as a minor itself has a role as a time stamp server. Each timestamp forms a chain by including the immediately preceding time stamp in its hash. However, there is no guarantee that these nodes hold the correct physical clock. The numerical value of the time stamp, that is, the sequence and time of each transaction is relatively ambiguous.

Due to such ambiguity of the clock, there is a possibility that double payment will be made. However, in the bitcoin blockchain, only the longest chain is legitimate, incorrect transactions are discarded after minor verification. Therefore, the order of blocks is uniquely determined with the lapse of time. As each time stamp increases, the previous time stamp is reinforced.

In summary, the order consistency of transactions is inaccurate under ambiguous timestamps in blockchains. However, with the simple mechanism of connecting in a chain form, the happens-before relation of each transaction is established with the lapse of time . In addition, there is an incentive structure for minor to move to good, transactions inconsistent in order do not occur.

It can be said that a clock synchronization method similar to the Lamport’s logical clock is realized in that the relative order relationship between transactions, that is, the happens-before relation becomes clearer.

For most transactions, there is no causality, so if you introduce a vector clock and adopt the concept of causality ordering, constraints of order relation may be greatly relaxed. However, in the blockchain, since the structure itself sharing the order relation of all blocks by default, the total ordering is maintained (with respect to the block after a certain period of time).

6–2. Block chain and exclusive control algorithm

Even in the blockchain as a distributed system, exclusion control is necessary. In the blockchain network, each node operates asynchronously in parallel. At this time, the information of the blockchain itself to be shared should not be inconsistent.

Exclusive control algorithm in PoW · PoS

As described in Chapter 4, the distributed exclusive control algorithm can be classified into the following two types.

Token based solution
Permission-based solution

PoW and PoS are permission based, and among them, it can be said that it is a mechanism similar to a distributed algorithm . So, when do you get permission to access resources? Yes, it was when You found a nonce.

In PoW, it is possible to perform a valid new block write operation only when it finds a nonce that has 0 at the head of the hash value followed by n digits. The minor who executed the write operation broadcasts it to all minors and shares it.
Normally, when a node finds a nonce and creates a block earlier than himself, the minor synchronizes that information and moves to search for the next nonce value. This is because they can get more profits if you search for the next nonce value with the rule that the longest chain is considered to be legitimate. Although PoS gives priority to giving access to resources to those with a large coin holding amount, the basic exclusion control algorithm structure is also similar to the distributed algorithm .

However, strictly speaking, exclusion control is not performed. This is in order to synchronize and to form consensus in a common time for 10 minutes until the next block is made. When two or more nodes simultaneously find a nonce value, the write operation is performed in a non-exclusive state. At this time, since only the longest chain is considered to be legitimate, the information in the blockchain network keeps consistency with the lapse of time. It is one problem that the fork happens because strict exclusive control is not performed and the finality is not confirmed.

Exclusive control algorithm in BFT type

On the other hand , exclusive control is performed by BFT type, permission based decentralized algorithm . This algorithm solves the problem of fork and finality which was a problem in PoW similar to distributed algorithm .

In BFT type, only one node called Proposer, Orderer and so on… has the right to generate new block. When creating a block, you can collect votes from all the participating nodes and you will get permission to create a new block only when more than 2/3 consent is obtained. At this time, the reason why it is necessary to agree more than 2/3 rather than a majority is to deal with Byzantine faults, details on this are described in the article “Fault tolerance in distributed systems”.

In the BFT type algorithm, unlike PoW etc., only one node can obtain exclusive access right to the block chain, so it does not fork and finality is determined immediately. However, the property that anyone can participate in the network as a minor tend to be lost.

6–3. Block chain and leader selection algorithm

PoW, PoS and leader selection algorithm

The leader selection algorithm on the blockchain is similar to the mechanism of the exclusive control algorithm. In bitcoin, the algorithm for electing a leader, that is, a node that newly creates a block is PoW.

PoW gives permission to add a block as a good leader that contributes to the bitcoin network to nodes that have computational complexity and found nonce. Each minor who will become a leader tries to contribute to a bitcoin network because it is easier to sync early to the node that first found the nonce and start searching for the nonce value of the next block is more likely to earn reward. Although it has the problem that the chain is completely branched by the hard fork, synchronization is realized in the block chain network as a distributed system by preparing a very simple incentive structure based on the game theory.

In the case of Ethereum, since the time to block generation is short, more fork tends to occur. Regarding that point, by adopting the concept of unkle blocks, we realize a structure that gives a certain amount of reward even if that make a chain that is not legitimate.

PoS, which is to be introduced to the future in the future, gives permission to generate a block as a leader preferentially to a node with a large coin holding amount. This is an algorithm that solves / improves the problem such as the necessary electricity quantity in PoW becomes enormous and vulnerability to 51% attack. It is a election algorithm based on game theory that if a node holding a large amount of coins, malicious behaviors like destroying the network are not taken.

BFT and leader selection algorithm

The problem with the BFT type algorithm is how to select a leader who will vote for block generation as Proposer or Orderer.

In HyperLedger which adopts PBFT, trusted institution is originally registered as Orderer. However, this is a leader selection by a centralized method, which is a distinction from a distributed system.

In the Tendermint protocol, leaders are elected in a round-robin fashion so that proposals can be made by turnover alternation with different validators. At this time, the leader candidate is based on PoS and can be said to be one of the algorithms that can realize the leader selection in the distributed system.

DISTRIBUTED SYSTEMS “Principles and Paradigms” Andrew S.Tanenbaum, Maarten Van Steen

— — — — — — — — — — — — — — — -
Cosmos Gaming Hub Project(Former MOLD project)
CEO & Co-Founder

Takumi Asano

For all game enthusiasts