HSN Technology Analysis Part 1：What is “Encryption De-duplication”？
Question from one HSN community member:
In the Whitepaper, we noticed that HSN has applied a key technology called “encryption de-duplication”. What is its principle and what role does it play in the HSN ecosystem?
HSN team’s answer:
In the data storage industry, de-duplication technology is very important in the file system field, which can greatly improve the efficiency of storage space.
For example, in the same system, if there are two files with identical contents (in this case, the identical content not only means that the semantic content of the file is identical, but also that the binary files corresponding to the two files are identical from the computation perspective), then, without the use of de-duplication technology, the two files need to be stored twice, and occupy twice as much space. This will cause a certain waste, and the situation will deteriorate with more files and copies being added.
Using de-duplication technology instead, for the same file with identical content, no matter how many copies are needed, only one copy of the content needs to be stored in the file system, occupying the space only once, and thus saving actual storage space.
At present, this technology has been widely used in Cloud storage, CDN and other systems. For example, when users of Baidu Cloud Disk want to download a high-definition movie from another user’s Cloud Disk to their own space, no matter how many files they need, all files can be downloaded in seconds. The reason is that the Cloud Disk file system uses the technology of de-duplication to avoid actual copying the files. By adding a small amount of file management information, the same file can be quickly downloaded between multiple users’ Cloud Disks, which saves both space and time.
In the file system storing contents in plaintext, the technology of de-duplication is very easy to be implemented. It does not need to understand the semantic content of the document. It only needs to calculate the MD5 fingerprint information of the binary file, to determine whether the content of the document is the same.
On HSN blockchain network, for the concern of users’ privacy and security, users’ confidential data needs to be encrypted with its private key and then stored on the HSN network. Although the data security of users has been significantly improved, it also poses a new challenge to the de-duplication technology. Because the binary files generated by encrypting the identical files owned by different users through their respective proprietary keys are totally different, it is impossible to judge whether the files owned by different users are completely identical, by calculating MD5 fingerprint information. Thus, the traditional de-duplication technology has lost its usage in the area of blockchain.
For this reason, HSN’s development team has designed and implemented a unique “de-duplication storage technology for multiple identical files after encryption” (referred to as encryption de-duplication technology). This makes it possible that for multiple identical files that are encrypted with different keys, only one copy needs be stored, so as to improve the storage space efficiency of the whole HSN network.
The principle of encryption de-duplication technology involves a large number of data calculation and formulas. As this is an innovative solution developed by the HSN team, a patent application with the Chinese patent office has been filed.
Simply put, the process of encryption de-duplication can be described as follows:
1) For large files that need to be stored, before encrypting, we cut them into several fragments, get the hash value of each fragment, and then judge whether the fragment value already exists according to the existing “fragment-hash value-node” mapping table.
2) If it already exists, replace the new fragment with the existing fragment and update the node to the original one.
3) After that, the data fragments are encrypted by the key generated by a specific seed string, then distributed to the edge nodes and recorded in the mapping table.
4) When a user needs to retrieve a file, the fragments contained in the file will be collected from each edge node, decrypted, and then combined into a complete file.
In this way, we can save only one file and ensure that all information is encrypted completely in the process. Please refer to Section 4.4.1 of “HSN White Paper”: Data Encryption Reduplication Algorithms.
We will introduce more innovative technologies in future. Welcome to communicate with us on any questions that you have or are interested.