Probabilistic Sampling in Flyclient (1)

MAP Protocol

Published in

MAP Protocol

5 min readAug 1, 2020

Content
1. Introduction
2. Why We Need to Sample
3. How to Sample
4. Specific Sampling Distribution

Introduction

As we mentioned in the last article. MAP is a universal, secure, and low-cost chain-to-chain interoperation protocol. It will become a robust and complete infrastructure for various DAPP. In order to do this, we will need to establish a unified on-chain information verification specification, which must meet all requirements of ultra-light local chain verification protocol.

The first problem we have to solve is how to trace the source. MMR technology can help us solve this problem, it provides us with lightweight verification. In the last article, we introduced MMR technology. The advantage of MMR is that when new data arrives, the value of the intermediate node does not need to be changed, and the data is always appended. It can also provide fewer nodes when proving and verifying, which greatly reduces the amount of data and the burden of storage, and improves the verification efficiency. Another question is how to let the client check invalid blocks? In this article, we will get to know the probabilistic sampling technique in Flyclient.

Why We Need to Sample

If the client can find an invalid blockhead sent by a node, then the client can conclude that this node is malicious. Now suppose there are two nodes, at least one of which is honest. A client has received the certification information from these two nodes.

In the Bitcoin blockchain, although there may be multiple forked chains at a certain moment, the longest one is recognized as valid. Because the mining power of the malicious node is less than the sum of the computing power of all other honest nodes, the forked chain of the malicious node is definitely shorter than the effective chain.

However, in order to prevent the client from discovering this, the malicious node adds an invalid block to its own chain, making the length the same as the effective chain length, or even longer. Now the client cannot directly determine which one is valid by the length of the blockchain, so he adopts the method of sampling blockheads to determine malicious intact nodes by verifying the invalidity of the sampled blockheads. But how to sample?

How to Sample?

One method is for the verifier to request a uniform random set of multiple blocks from each prover. Because the malicious prover has only limited computing power, it can only correctly mine a small part of all blocks. Therefore, the validator needs to sample enough blocks to ensure that at least one of them is invalid.

For example, assuming that the length of the blockchain of the honest node is L, if the malicious node wants to deceive the light node, it must forge a blockchain of length M (M > L). Since the light node will randomly select some blocks for verification, if the malicious node forges all M blocks, it will be found. So, the malicious node will only forge some blocks. It is assumed that the percentage of the computing power of the malicious node lagging behind that of the honest node of the whole network is a%, that is to say, the honest node creates 100 blocks, and the malicious node can also dig out blocks. We need to know that, once the malicious certifier forks from the honesty chain, it cannot include any future honesty blocks in its chain, because the MMR root in these blocks does not match the chain. With this setting, if the verifier makes enough queries, it will eventually ask the malicious verifier about the unmined blocks.

Assuming that the malicious node starts to bifurcate from the F block of the blockchain, then, the honest node creates L-F blocks, at the same time, malicious nodes can dig up (L-F) * a% block at most. The remaining blocks must be forged by malicious nodes, which does not meet the requirements of mining, so if these forged blocks are spot-checked by the light node, the light node can be easily identified. Therefore, the core of the algorithm is to determine an appropriate sampling method.

As shown in the figure above, suppose a=40, starting from the bifurcation point, assume that the honest node can dig out 5 blocks, then the malicious node can only dig out 2 blocks, so it needs to forge 4 blocks. In this way, its forged blockchain can have 6 blocks, more than 5 blocks of honest nodes, such as the yellow block shown in the figure is a legal block, and the green block is a forged block. If these green blocks are drawn when sampling, then the light node can know that the malicious node is deceiving it, thus ignoring any information provided by these nodes.

Specific Sampling Distribution

Let’s take a look at the specific sampling distribution in this section. The new sampling protocol first samples k random blocks from the entire chain. Then, it successively splits the chain (or the current interval) in half and queries another random k blocks from the last half, i.e., the interval always ends with the tip of the chain.

According to this sampling method, the probability density function of an invalid block can be found as shown in the s(x) part of the figure below. The x coordinate represents the relative distance from the genesis block; the y coordinate axis represents the probability of finding an invalid block. This graph shows an exponentially increasing distribution because from the sampling method above, the sampling density is getting larger and larger.

On this basis, we can calculate a better probability density function and prove that it is the best:

Given that the last δ = c^k, c∈(0,1], k∈N fraction of the chain contains only valid blocks and the adversary can at most create a c fraction of valid blocks after the fork point a, the sampling distribution defined by the PDF g(x) =1/(x−1) ln(δ) maximizes the probability of catching any adversary that optimizes the placement of invalid blocks.

· MarcoPolo Protocol Twitter and Telegram Channel (For the latest news)

· MarcoPolo Protocol Telegram Community: English, Россия, Türkiye, Việt Nam, Brazil, and Indonesia; Korean Kakao Community: 코리아

· MarcoPolo Protocol Medium (For the latest articles)

· MarcoPolo Protocol WhitePaper, and Roadmap

· MarcoPolo Protocol GitHub (For the complete codes)

For more information, visit marcopolo.link

Probabilistic Sampling in Flyclient (1)

Introduction

Why We Need to Sample

How to Sample?

Specific Sampling Distribution

Written by MAP Protocol