DxBox: Blockchain based web application for secure data storage

This article briefly discusses the issues existed in modern cloud storage and how the blockchain based storage application can be a potential solution. More specifically, it also discusses the application developed by DxChain Network named DxBox with an introduction to its architectural design and economic model.

Introduction

With the explosive development of science and technology, blockchain is proposed as an advanced concept and solution to solve problems on existing systems, such as cloud storage. As the name implies, the cloud storage is a data storage system that works with distributed data centers which take advantage of virtualization technology. In recent years, the cloud storage has received massive attractions in personal and business organizations due to its convenience and efficiency: users can access the data they stored in the cloud from anywhere and at any time with access to the internet. However, there still exists weaknesses for current cloud storage, such as data security and data privacy.

When users upload their data to the cloud, it will first arrive at the master control data server, which serves as the brain of the system. Data will then be transmitted to various data centers and multiple copies of data will be made in case one copy gets destroyed. Once the data is uploaded, the safety of them will be completely depended on the safety of the data centers. If data centers got damaged due to the natural disaster or human mistake, data may completely be erased. Google has once encountered this issue. In 2015, its data center located in Belgium was struck by lightning four times and caused permanent data loss.

Besides the concerns regarding data safety, centralization and lack of data encryption are other issues existed in modern cloud storage systems. Most cloud storage services only provide data encryption during transmission using SSL/TLS encryption mechanism, which is vulnerable to attack. In addition, due to centralization, the server administrators also have direct access to data stored on the server. Even though most companies have strict policies on protecting users’ privacy, as long as human beings are involved, there is a risk on the data breach.

To address the issues mentioned above, we have designed and implemented a blockchain based storage application — DxBox.

Architecture Design

DxBox is a demo application builds on top of DxChain Testnet v0.3.6. It serves as an interface which simplifies the process for users on file uploading and downloading. The application uses the client-provider model for data transferring, where the client is a node that uses DX Token, the DxChain cryptocurrency, for file storage, file uploading, and file downloading. Provider, on the other hand, is a node that provides storage services with disk space to get profits rewarded by the client. To become a provider, the node must make an announcement, which will be added to the blockchain.

These are what happened behind the scenes. Before the user can upload files, the client must find qualified providers and sign contracts with them. The process of finding qualified providers happened in the client node consensus process. In the client’s consensus process, blocks will be synchronized sequentially. If the client observed a transaction in a block that contains a provider’s information, the information will be recorded into the client’s database. At the same time, the client will constantly loop through all records from the database and reach out to the providers to get their settings, such as contract price, storage price, uploading and downloading price. All the providers’ settings are pre-configured for DxBox application. Afterward, the client node will automatically select a number of top rated candidates and starts to form contracts with each of them. For instance, DxBox chooses 10 candidates out of 128 available providers. Each contract keeps storage records including data size, the expiration block height, and etc. Once the contract is successfully formed, the client is able to upload and download files.

Figure 1: DxBox File Upload Process

As illustrated in Figure 1 above, once the user uploads a file, the file will be divided into chunks. Each chunk will be further encrypted with the secret key generated by the client and then be divided into shards using erasure coding mechanism. Erasure coding is a data protection algorithm that expands and encodes the original data with redundant data shards to ensure the reliability of the storage system. In case a provider went offline or some data shards got corrupted, only parts of the total data shards are needed to recover the entire file. Both data shards and parity shards are pre-configured, where the former one represents the number of shards needed to recover a chunk, and the latter one represents the number of redundant shards. For DxBox application, both data and parity shards are configured as 5, meaning each chunk will be split into 10 shards, and only half of them are needed to recover the chunk. The number of chunks a file can be divided depends on the file size, data shards, and shard size, which is 4 MB by default. The relationship among them can be represented by the equation below:

For instance, to upload a 10MB file using DxBox, the file will be divided into three chunks. Lastly, once each shard from a chunk is successfully uploaded to a distinct provider, the file uploading process was finished. In the file uploading process, the use of chunk encryption and erasure coding algorithm enhanced the reliability and security of the storage system. Since each provider only has pieces of the file, it is impossible for a single provider to reveal the entire content of the original file. Even in the worst case that an attacker managed to get all needed data shards and is able to convert them into chunks. Those chunks are still encrypted by the client’s secret key.

Figure 2: DxBox File Download Process

File downloading is a completely reversed process compared to file uploading. As illustrated in Figure 2, once the user sends the download request, the client server will randomly select 7 providers to get needed file shards. The number of needed providers is calculated based on the equation below:

Once all needed data shards were successfully downloaded from the providers, they will be converted back into encrypted chunks. Once the chunks got decrypted and converted back to the original file, the user is able to download it from the server to the local machine.

Economic Model

At the stage of forming a new storage contract, both client and provider need to put a fund into the contract. All the fees needed for file storage, contracts creation, file uploading, and file downloading will be deducted from funds provided by the client. The fund provided by the provider is used as collateral. When forming a new contract, the client needs to pay providers contract fees, and each uploading and downloading operation will be charged based on the providers’ settings. The changes in spending will be recorded into the storage contract, known as contract revision. However, to ensure the nonrepudiation of the storage contract and to avoid storing excessive data on the chain, only the result of the last revision will be submitted, which is done automatically by the provider. After the contract expired, the provider will automatically submit the storage proof. Thus, the provider can have the collaterals back along with the profits earned from this contract and the client is able to take back unspent funds. If the provider failed to prove the file is stored in its disk space at the time of contract expiration, money will be deducted from the provider’s fund as punishment.

For DxBox, before the contract ends, the client will automatically form a new contract with each provider. After the new contract was formed, even when the old contract is not expired yet, all the spendings regarding the file storage, contract creating, file uploading, and file downloading will be recorded in the new file contract.

Conclusion

In conclusion, nowadays, cloud storage becomes more and more popular due to its ease of usability and efficiency. However, people still have concerns about this technology due to potential data privacy and data security issues. By using blockchain based storage application, such as DxBox, those issues can be prevented: files are split into chunks, encrypted with the user’s secret key, and then further used erasure coding algorithm to encodes and expands chunks into shards. Those technics ensured the reliability and security of the system. Lastly, because of the economic model DxBox used, people will be willing to provide their disk space to get profit. DxBox is not perfect, it still leaves much room for improvement, such as allowing experienced users to be able to modify client settings, optimizing the registration system, and etc. In addition, we would like to know your thoughts on DxBox, and everyone is welcome to try out the very first demo DApp powered by DxChain!

Reference

DxChain Testnet v0.3.6 Documentation. (2019, January). Retrieved March 15, 2019, from https://dxchainapidoc.readthedocs.io/en/latest/index.html

Li, J., Wu, J., & Chen, L. (2018). Block-secure: Blockchain based scheme for secure P2P cloud storage. Information Sciences,465, 219–231. doi:10.1016/j.ins.2018.06.071

Permanent data loss at Google as lightning strikes four times. (2015, August 19). Retrieved March 15, 2019, from http://www.digitaljournal.com/technology/permanent-data-loss-at-google-as-lightning-strikes-four-times/article/441496