Consistency and Duplication in a distributed system (What is the protocol MOLD needs?)

Published in

GAME

19 min readSep 21, 2018

Distributed system is defined by Tanenbaum that “A distributed system is a set of independent computers that appear as a single, coherent system for users” in “DISTRIBUTED SYSTEMS — Principles and Paradigms”.

Blockchain, by constructing a worldwide distributed system, attempt to realize decentralized new data store and organization structure.

In the first place, the reasons for orienting to distributed systems are mainly scalability, locality, and availability. Blockchain is not exception. Geographical scalability to form a global value store network / Locality for information protection including tamper resistance under non-centralized structure / Availability with zero downtime. These futures are all realized in blockchain using distributed system.

MOLD develops its own chain, but in constructing a distributed system, it is indispensable to understand what is a distributed system in the first place and what is necessary for normal and high performance operation. I would like to describe what kind of distributed system originally is so that realize MOLD’s non-centralized virtual space.

0. Table of Contents

1. Introduction (Overview of Consistency and Reasons for Consistency)

In the distributed system, data is duplicated mainly for “reliability” and “performance”. Replication is required, especially if the distributed system needs to grow in quantity and geographically. Replication is such one of the scaling techniques. Also, it can cope with data corruption and replica crash.

At this time, it is necessary to maintain the consistency of the state of the data and the replica. However, this directly leads to scalability problems. The ideal when you think about consistency is “To Keep All Replicas In Exactly The Same State And Operation”, that is, atomicity is implemented, but it’s quite a challenge.

In reality, we can not solve the problem of scalability unless we give up some consistency constraints. Thus, it is necessary to understand how much consistency is required in the system, and how to realize its extent of consistency.

In this chapter, consistency between copies will be explained in the following order.

What kind of consistency models are there?
About duplication management
— Where and when to place replication
— Who places replicas (Client? Server?)
— How to ensure consistency between copies

· Specific consistency protocol example and its implementation

2.Data-centric consistency model

Traditionally, consistency has been discussed mainly from data (data stores).

In a distributed system, each process holds its own local copy. At this time, a collection of stored data including each local copy is called a distributed data store.

When the process reads from a data store, it expects the result of the last write operation to be returned as data. In order to clarify which is the last write, we call ordering relation as consistency.

As mentioned in Section 1, it is nearly impossible to maintain complete consistency. Allowing some extent of inconsistency leads compatibility with performance, but it depends on your application that what you can tolerate and how much you can tolerate. You should choose a level of consistency according to your application, system. Therefore, it is necessary to understand first what kind of consistencies there are.

Sequential consistency is the most popular and important consistency model. There is Casual consistency as its weak variant.

2–1. Sequencial Consistency

When the following conditions are satisfied, the data store is Sequentially consistent.

The result of any execution sequence is that the read / write operations by all processes to the data store are the same as the results executed in a certain sequential specific order, and the operations of the individual processes are executed in the order specified by the process It appears in the sequence of.

The point is that they are in the exactly same order when comparing the operation of each process.
In the following figures, (a) is sequentially consistent, but (b) is not because the order of reading in Process3 and Process4 is different. Specifically, in Process3, it is supposed to change from the value R(x)b of the write result by Process2 to the value R(x)a by P1, but it is the reverse in P4.

※W(x)a represents the writing of the value “a” for the data “x” and R(x)b represents the reading of the value “b” from the data “x”.

2–2. Casual Consistency

When the following conditions are satisfied, the data store is Casually consistent.

Potentially causally related writes must be observed in the same order by all processes. For different machines, concurrent writes may be observed in different orders.

In other words, it reduces the constraints of sequential consistency, and processing sequence without causality can be in different orders.

At this time, it is important each operation depends on which operations (whether two of the operations have causal relationship or not), and for that, it is effective to use “vector timestamp”.

In the following figure, in the above one, since writing of the value “b” and the value “c” is a parallel operation, it satisfies Causal consistency.

2–3. Entry Consistency

First, prepare synchronization variables. It is also possible to maintain consistency by acquiring synchronization variables and allowing only the process owing its synchronization variable to update the data. This is called “Entry Consistency”.

For entry consistency to be preserved, there should not be two owners of synchronous variables at exclusive mode access such as writing. In a non-exclusive mode that can be read but not written, it is possible for multiple processes to own synchronous variables at the same time.

In the following figures, since Process2 does not hold the access right (= synchronous variable) to the data item “y”, the reading result becomes NIL.

At this time, how to properly correlate data and synchronization variables becomes a problem. In the case of object-oriented, it can be realized by implicitly associating a synchronization variable with each object.

3. Client-centric consistency model

Eventual consistency and client-centric consistency model

In general, inconsistencies between processes are acceptable to some extent. For example, in a web cache, cached pages responded to a client may be older than the actual version of the web server actually exists. However, for many users, such inconsistencies are acceptable to some extent.

Even if there is inconsistency as described above, all replicas will gradually become consistent as time goes by. Consistency of this form is called eventual consistency.

Eventually consistent datastore operate normally only if the client always accesses the same replica. However, what if there are mobile users who move and access different replicas in a short time? For example, if you are on express, the client will move to another location in a short period of time and access different replicas, but here it seems inconsistent unless previously done updates have already been propagated.

As for the above problem, introduction of client centric consistency is a solution. Unlike the data-centric consistency model in the previous section, this model aims to guarantee consistency for a single client.

Client-centric consistency has several paterns as follows.

· Monotonic reading
· Monotonic writing
· Read after writing
· Write after reading

Monotonic reading

The following sentences are the conditions that satisfy monotonic reading consistency.

If a process reads data item x, any subsequent reads on x by that process will either reply with the same value or reply with a newer value.

In the following figures, in (b) there is no guarantee that the contents of the write operation WS(x1) is included in the reading result of L2, so there is no monotonic read consistency.

Whenever you connect to a mail server anywhere, it is guaranteed that all mails that you can read when you access the server up to the last time can be read at the destination.

Monotonic writs

The following sentences are the conditions for satisfying monotonic writing consistency.

A write operation by one process to a data item x is  completed before any subsequent write to x by the same process.

Monotonic write consistency is similar to data-centered FIFO consistency. The essence of FIFO consistency is that write operations by the same process are done in the correct order everywhere. In monotonic writing consistency, it is the same order constraint, but only a single process, not a concurrent process collection, is targeted.

Read Your Writes

The following sentences are the conditions that satisfy read consistency after writing.

The result of a write operation by a process to data item x is always observed by subsequent read operations by the same process.

That is, a write operation is always completed before a subsequent read operation performed by the same process, no matter where it is done.

Example 1: After updating a web page, new browser content will always be displayed in your browser.

Example 2: When password is updated, guarantee that updated password can be used at the place you moved to.

Writes Follow Reads

The following sentences are the conditions for satisfying write consistency after reading.

The result of a write operation by a process to data item x is always observed by subsequent read operations by the same process.

4. Replication management

Up to now, we have explained the types of consistency in detail. In this section, we will explain replication arrangement, where, when, when and how the replica is placed, and how to achieve consistency.

There are two types of replication placement:

Replica server placement problem
Contents placement problem.

4–1. Replica Server Placement Problem

As for the server placement problem, there is one way to arrange it based on the distance between the client and the server. Distance can be measured by transmission delay and bandwidth. It is good to select one server so that the average distance between server and client is minimized.

As an alternative, there is a way of considering the network topology. It is good to place the server on a router with the maximum number of network interfaces (i.e. links).

Problems of them are large quantity of computational complexity. It can not tolerate the calculation at the time of a “flash cloud” (a sudden explosive request for a certain site). At this time, there are some solutions to increase efficiency such as finding areas with the most access by delimiting areas.

4–2. Content duplication and placement

Regarding the contents, three different types of replicas are distinguished and organized as shown in the figure below.

Permanent replica

This is the initial set of duplicates that make up the distributed data store. In many cases, the number of permanent replicas is small.

Server Startup Replica

A copy of the data store that is generated by the launch of the data store owner and used to increase performance. Dynamic placement replication means to dynamically copy files that need to be improved in performance to the server in the web hosting service.

For example, to the web server in NewYork, if the number of explosive requests increases over several days by a client at an unexpected location away from the server, it is effective to place duplicates in that area temporarily.

Client Startup Replica

This is commonly referred to as client cache. This is only used to improve the access time to the data. When it is necessary to read the same data, the cache function that holds the replica on the client side only for a limited time is effective.

4–3. Content distribution

Replication management also includes propagation of updated content delivery to the replica server (how to update).

Information type to be propagated

There are three possibilities for information to be actually propagated.

Propagate only updates notifications
Transmit update data from one copy to another
Propagate update operations to other replicas

Propagation of notification of 1 is done by invalidation notification protocol. You can only notice that replica information is no longer valid. Advantageously, it uses little network bandwidth, and it is used when replica do not have to update frequently.

2 is to transmit the updated data between replicas. Updates tend to be more effective and practical.

3 notifies only what kind of update operation should be performed. At this time, it can be assumed that each replica always has the ability to keep the data up to date, so consistency is improved from 1.

Pull vs. Push

There are two methods of update propagation, pull and push.

When the read / write ratio is high, efficiency is better because consistency is maintained when updated by push from the server.

On the other hand, when the read / write ratio is low, high updating frequency is wasteful use of bandwidth etc, so pull is more suitable.

Push and pull have the following features.

*Poll: One of the control methods for smoothly linking multiple devices and software, a method for a main system to ask to other systems whether there are or not at regular intervals.

Unicast vs. Multicast

The push-based method can be implemented efficiently by using multicast. In contrast, unicast is the most efficient solution for pull-based methods.

5. Consistency Protocol

In this section, we will check what concrete protocol is specifically.

5–1. Primary base protocol

Primary base protocol is often used for realization of sequential-consistency. In this protocol, the primary server associated with item x performs its write operations responsibly.

Primary base protocol has two ways.

The way to affirm the primary server to a specific server
The way to execute a write operation after moving the primary server to the server where the write operation was started

5–1–1. Remote-write protocol

All write operations are transferred to a fixed single primary server and executed. In the Remote Write Protocol (Primary Backup Protocol), it is a straightforward implementation of sequential-consistency since all writes can be ordered by the primary server in globally unique time order.

(However, this protocol has a performance problem of being kept waiting for a relatively long time, which is improved by adopting the nonblocking method, but it is a tradeoff with the reliability of updating.

5–1–2. Local write protocol

A protocol that moves the rights so that the server that operates the write operation can operate as a primary replica. It was applied to many distributed memory systems.

5–2. Replicated-write protocol

In the primary base replication, the write operation is performed on one copy, but in the duplicate write protocol, the write operation is executed for plural copies.

There are two ways for the replicated-write protocol,

Active replication protocol that transfer operational instructions to all replicas
Consistency protocol based on majority voting

5–2–1. Active replication protocol

Each copy has a process of executing an update operation, and a write operation is executed. At this time, all repicas must execute operations in the same order.

Ordering can be realized by total order multicast using Lamport’s clock, but it is difficult to expand the scale to a large distributed system and it is common to realize total ordering by using a central coordinator or sequencer.

5–2–2. Constant base protocol

This is to realize the write operation duplicated using voting. Basically, before execution of reading and writing of data items, request permission of operation to a plurality of replica servers is made, and execution is performed when permission is obtained.

At this time, let N be the number of servers, let NR be a reading constant and NW be a writing constant, then we have the following constraints:

1. NW + NR> N
2. NW> N / 2

When satisfying these two, it is possible to prevent reading and writing competition and writing competition. The two constants are optimized according to the frequency of each reading / writing.

5–3. Cache Consistency Protocol

Unlike the previous two, consistency is controlled by the client, not the server. Cache consistency protocols can be divided into several design methods.

First, it can be divided as follows depending on when inconsistency is detected.

The case to Verify consistency when cached data items are accessed during transaction
The case you want the transaction to proceed while validating the data item
The case to Verify consistency only when the transaction commits

Next, it can be divided as follows according to the method of maintaining consistency among replicas.

When updating, with the server always invalidating all caches
When simply propagating updates

In many cases, the update operation is done only by the server, and updates are propagated by the pull base from the client.

There is also a method similar to the Primary Base Local Write Protocol, which allows the client to modify the cached data. This method is used in a distributed file system and is called write-through cache method. At this time, in order to guarantee sequential-consistency, the client needs to hold exclusive write permission. If you delay the electromagnetic wave of update and allow multiple writes before notifying the server, you can further improve performance, this is called write-back cache.

6. Consistency in BlockChain

As we mentioned, in order to maintain consistency among processes and data duplicated in the distributed system, various ingenuity is necessary.

Normally, maintaining consistency in large distributed systems is a very painful task. A blockchain is a exact mechanism for realizing a distributed system indeed, but it is a new technology that makes it possible to realize complete sequential-consistency through its basic structure of connecting the blocks in a chain.

6–1. Sequential-Consistency in Blockchain

In the block chain, the order between two transactions are always consistent. For example, in the case of a Bitcoin, the order concerning the transaction of a certain Bitcoin, such as Mr. A’s UTXO shifting to Mr. B and then Mr. B made payment to C by that UTXO, is consistent at every node. This is due to the feature that blocks that include transactions are connected on chain. Also, this is due to the feature of Bitcoin that consensus for creation of a new block is reached, and transactions are not approved unless they are consistent with past transactions.

From above features, it can be seen that sequential-consistency is maintained by default for transactions on the blockchain in all nodes. In this way, it can be said that the blockchain realizes the distributed system and its consistency by a very simple mechanism.

However, considering the consistency of blockchains more precisely, it can be said that it is rather incomplete consistency. For further consideration, I would like to first confirm the CAP theorem.

6–2. CAP Theorem and BlockChain

“CAP theorem” is the nature of a distributed system that was predicted by Brewer in 2000 and later formulated and certified by Gilbert et al. The CAP theorem states that two systems of consistency, availability, and fragmentation can be satisfied in a distributed system, but it can not satisfy all three at the same time.

In Permission less public blockchains that adopt PoW as a consensus algorithm like Bitcoin (details are in the article of “Block Chain Comparison Public Private Consortium Type”) It is in a high standard for availability and partition tolerance. Even if some node fails, the blockchain network continues to move (availability), even if the network is broken, it can communicate (partition tolerance). On the other hand, it is imperfect with respect to consistency. As mentioned in the previous section, although the sequential-consistency of transactions is certainly established as time passes, forking occurs without finality under the consensus algorithm like PoW. Each minor broadcasts at their own timing when they find a nonce and updates to the latest state, so frequent fork happens. Applying it to the CAP theorem, we choose availability and partition tolerance and sacrifice consistency to a certain extent.

However, as time passes, its consistency will become more certain. Although the fact that there is no finality is still being discussed as a problem, even though the discussion is still underway, it is excellent system that keeps complete sequential-consistency over time and also has tamper resistance. This is one of the reasons why blockchain technology gains more and more attention in the world these days.

6–3. Block chain and duplication management

As we have mentioned in this article so far, when running a distributed system, problem of replication management is important; where, when, who is going to manage replication, an what kind of mechanism is to be taken to ensure consistency between replications. So, how is duplication management maintained in the blockchain?

First, regarding replica server placement problem. Generally, when considering the server placement problem, we find most efficient placement based on the place where communication is concentrated etc. However, blockchains such as Bitcoin is operating non-centralized management, and it depends on each participant whether to join the network as a full node or not. That is, it is difficult to flexibly arrange the system comprehensively.

So, how the distribution of update contents between replicas are operated? As we mentioned, there are three possibilities for information to be propagated at the time of updating.

Propagate only updates notifications
Transmit update data from one copy to another
Propagate update operations to other replicas

In the blockchain, in order to prevent forking or tampering, it is desirable that all nodes hold up-to-date information on the network as much as possible. In addition, since only the longest chain is considered as a legitimate chain, an incentive structure is designed for all miners to capture information on the latest block quickly to find the next nonce value ASAP. For the above reasons, in blockchain, the update data itself is transmitted between the replicas, and exchange of updated contents is performed as soon as possible. That is to say, “1. Propagate update operations to other replicas” is the way blockchain choose.

Also, basically, blockchain, which is a non-centralized system, is performed by P2P communication. Multicast is mainly used for P2P communication in the blockchain network.
First of all, when a minor finds a nonce value, push-based multicast is done for all network participants immediately. Miners try to fix the reward for creating a new block after gather confirmations from other validators by informing new block ASAP which the miner itself find.
A node newly joining the network and a node which is disconnected and rejoins again performs pull-based multicast to synchronize the latest information of the network. In order to prevent form being deceived from double tampering or the like, the client tries to acquire the latest data as much as possible and at the same time trying to communicate with multiple nodes as much as possible.

Also, for reliable full nodes, there are clients that unicast under pull-based protocols and capture the latest data in applications.

6–4. Blockchain and Primary-Base Protocol

In blockchain, nodes are scattered all over the world, and replicas are dispersed in all regions. In the case of PoW consensus, each transaction is newly written by the node that first found the next nonce. That is, each transaction have a primary server associated with it.

From this, we can conclude that the PoW protocol for maintaining the consistency of blockchain is classified as the local-write protocol of the primary-base protocol described in 5–1–2. In the Bitcoin blockchain, the write right as the primary server is obtained by finding the nonce value of PoW as the exclusive control / leader selection algorithm; the one who find nonce value of hash with N digits 0 on the head. That right moves for each block. However, when a node having the right to become the primary server appears simultaneously, the blockchain does fork.

6–5. MOLD Blockchain and NEW Replicated-Write protocol

On the other hand, various PBFT-based consensus algorithms including Tendermint do not have a primary server that first executes updating each data responsibly. They do not have the right to write operations for other replicas in each application, and holding of the right is limited to the nodes participating in the consensus algorithm, however, all the participating nodes can perform the write operation in the same period. In other words, the PBFT type consistency protocol is close to the active replication protocol of the replicated-write type.

One of the major challenges of the active replication protocol is that it is difficult for operations to be done in the same order in all replicas. Although this can be solved by executing total order multicast using Lamport ‘s logic clock, since this is difficult to scale up the scale, a central coordinator is often provided. The PBFT algorithm in Hyperledger is exactly like this form, and trustworthy organizations become leaders (Order) to manage the update order of blocks. However, with this, it is impossible to realize the non-centralized system and the goal of MOLD is distributed blockchain with decentralized operation.

Tendermint, on the other hand, uses a unique Tenderimint consensus that introduced a three-phase commit format to maintain total order consistency. For further explanation, “fault tolerance in a distributed system that forms block chains”). It is a very new and revolutionary active-replication protocol that realizes complete sequential-consistency and non-centralization simultaneously and has potential to MOLD’s vision come true.

— — — — — — — — — — — — — — — -
Cosmos Gaming Hub Project(Former MOLD project)
CEO & Co-Founder

Takumi Asano

For all game enthusiasts