3 minutes walkthrough of the CVT PDP data integrity verification protocol

CyberVein
CyberVein
Published in
6 min readSep 28, 2020

In order to well maintain the ecological operation of the PoC (Proof of Contribution) mechanism and ensure the security of the increasingly demanding “cloud storage” function, the CyberVein technical team has proposed a basic method to verify the availability and integrity of files — the PDP data integrity verification protocol. It is to insert multiple encrypted bytes “sentinel” into the data block to “defend”, and integrate the strict PDP detection mechanism to make the data block semi-encrypted but unreadable and non-intrusive. Every step of the entire process will be written into the blockchain, tamper-proof and traceable. Next, I will analyze this based on the relevant official information. You only need to spend 3 minutes to read carefully to have a certain understanding of PDP.

Is PDP complicated? It’s just that you didn’t read it right

I believe that many people will think that the technical principles are complicated, such as a description of CVT in its PDP mechanism: “ The PDP can check whether the remote cloud server reserves a file, which consists of a collection of n blocks. The data owner processes the data file to generate some metadata to store it locally. The file is then sent to the server, and the owner deletes the local copy of the file. The owner verifies the ownership of the file in the challenge response agreement. “The above passage probably means the PDP scheme in CVT. When you store a file on the chain, the cloud server will use N “blocks” to store the file separately, and the local disk can delete the original files to make more space for you. There is no need to worry about the loss of files because the nodes will record and supervise together.

Now let’s talk about it, if you can operate this PDP data integrity verification protocol, what should you do?

PDP has been around for 20 years and is currently the most widely used and most mature storage integrity certification mechanism in the distributed storage field. It can quickly determine whether the data on the remote node is complete, and is mainly used to detect the integrity of large data files. An application scenario of the PDP mechanism of CVT is as follows:

For example, a PDP application scenario:

1. Arthur asks Bobo to store a set of information;

2. Arthur himself does not store this set of information for many reasons, such as insufficient memory or unsafe storage on his computer;

3. After a while, Arthur asked Chloe to confirm whether Bobo still stored this set of information;

4. Of course, Chloe does not understand the content of this group of information, just a witness。

Note: In order to facilitate understanding, we substitute these roles: Arthur is the user, Bobo is the storage miner, and Chloe is the third-party auditor (hereinafter referred to as TPA).

The completion of this application scenario has the following two aspects:

A. The setup phase (you must first have a tool to complete the task):

1. Initialization phase

Arthur inserts his hard disk into the computer, the computer and the hard disk read and write PDP software at the same time, assign the variables to the default values, and set the controls to the default state. Arthur will have his own key after successful registration.

2. The user runs the key generation algorithm to generate a key pair (pk, sk)

Arthur follows the instructions, runs the key, and generates an algorithm and key pair (pk, sk)

3. Divide the stored files into blocks F=(m1, m2,…, mn)

Blocking refers to classification, which is convenient for searching in the future. File F is stored in blocks in the PoC system. These blocks have a fixed amount of storage (depending on the space contributed by miners), and they can also be linked to other blocks. If the saved file is small, just put it in one block. If the saved file is large, the large part will be divided into other blocks, and an empty block will be created, which will be linked to all other parts of the file. This empty block is similar to a large envelope, which will cover all parts of the entire file.

4. Run the data block label generation algorithm to generate a homomorphic label set Φ for each data block in the file

That is, after Arthur has classified the stored files, he will mark the files of his stored blocks, and this mark will be broadcast to all data blocks to keep the bottom.

5. Arthur stores the data file F and signature collection Φ in the cloud at the same time, and deletes the local {F, Φ}

The file F stored by Arthur is uploaded to the “cloud database” after authentication, and Bobo with sufficient storage space is found through system search, that is, the file is stored in Bobo’s “cloud database”.

At this time, Arthur’s local disk space will not be occupied, because Bobo has already stored the file F. According to the size of the storage file F (178MB), Bobo will receive corresponding token rewards.

B. Challenge stage (if you want to find this file)

1. As the verifier, the user or TPA periodically initiates integrity verification requests.

Arthur can initiate a verification request when he wants to check whether the file F is still in the cloud intact, or if he needs to download the file again. Of course, Arthur can also entrust Chloe to verify that the file F exists in Bobo in its entirety.

2. Randomly pick C block indexes {S1, S2,…,Sc} from the file F block index set [1, n], and select a random number Vi for each index Si, and combine the two to generate a challenge request and send it to the server.

At this time, Arthur followed the PDP operation (don’t be afraid, there are related instructions and steps, basically don’t need to think much), the system will automatically generate a “search warrant” and send it to the “cloud database”.

3. The server as the prover, according to the data file {F,Φ} stored on its server, invokes the evidence generation algorithm and the integrity evidence P, and returns it to the verifier.

The server will quickly index the file F. If the file is complete, it will feed back “Proof P in the Cloud” to Arthur in real time to confirm that Bobo still stores the file F completely, and it has not been lost or deleted.

4. After the verifier accepts the evidence, it executes the evidence detection algorithm to verify whether the evidence is correct.

If Chloe conducts the investigation and collects the “Proof P in the cloud”, Arthur is still uneasy, and can use the evidence detection algorithm to verify the authenticity and validity of the “Proof P in the cloud”.

The minimalism of PDP data integrity verification protocol

The traditional methods of uploading and downloading files or browsing web pages all require location-based addressing to obtain information from the server one at a time. Once the server fails, is restricted, or is attacked, the file will be lost or the webpage cannot be opened (error 404), or the IP will be deleted or the server will be shut down, which will make the user unable to use the file. If this file is needed by many people, then everyone has to download it before it can be used, causing a huge waste of storage space. The emergence of the PoC contribution mechanism can not only build a bridge for information exchange, but also rewards others. What’s more different is that PDP is both decentralized and minimal. If you need to upload or download files in the future, you no longer need to run to the central server, and there is no problem of information loss or tampering, and file sharing can be realized, saving storage space.

PDP actually reconstructs the way we transmit, obtain, and store information, completely change the way we view information, and become a part of our daily lives. The CyberVein team also established a PoC incentive system to ensure the operation of the system. Whether it is the financial freedom brought by the blockchain itself or the freedom of information brought to us by the PDP, it will undoubtedly be an important milestone in the history of human evolution.

--

--

CyberVein
CyberVein

CyberVein reinvents decentralized databases and the way we secure and monetize information.