A Decentralized Database for Decentralized Internet

In our previous post we have discussed how various data leaks are exposing our personal data to potential wrongdoers. However, not only “evil hackers” hunt that data, various government structures or regulators would also be happy to get their hands on it. We think that the best way to minimize negative consequences of such breaches is to start acting as if the data is already exposed to everyone on the Internet. In this case the data just has to be encrypted first before uploading to anywhere.

By decentralizing data storage and giving private keys of data to its owners, we can reduce leaks practically to zero. Removing centralized servers also imposes significant demands: total encryption and flexible access management.

Let’s define a database that meets all the requirements.

Disintermediation

What is a real decentralization for a database? It means that the database stores data on nodes that are controlled by anyone. A node, run by private owners, can join or leave the network at any moment of time and should not affect data accessibility and integrity. Simultaneously, because nodes can be located geographically anywhere, decentralization design gives fault tolerance and censorship resistance by default.

Fluence network architecture. Nodes are organized into Clusters with some amount of arbiters that are responsible for each Cluster

Confidentiality

Since the database relies on a trustless environment, there should not be a way for nodes or any third parties to read data uploaded by the owner. This can be achieved by encryption techniques when only the owner of a private key associated with particular data can decrypt a piece of data. Nodes store data itself and metadata, but are not able to extract anything valuable from it. In such architecture no data leaks are possible until a private key is revealed by the owner to the public.

Structured data

All mentioned above is achievable with decentralized file storages: Storj, Sia, Swarm. They provide accessibility, replication, and ability to store private files on trustless nodes by rewarding them with crypto tokens. The problem is that these projects are designed to store only files, but not the structured data. However, we can build a database index layer on top of a decentralized storage. We could create encrypted B-Tree indexes to represent the data structure without data deciphering. This approach allows to upload structured data, run queries, full-text search — all properties of traditional databases but without disclosing nature of data.

Consensus

In a trustless environment it is essential to protect data from attacks and counterfeits. Each data update or select should be confirmed by the network to guarantee truth. Using blockchain for this task will reduce speed, so we aim to use multi-signature responses and proof-of-retrievability algorithm. Our solution doesn’t require all nodes in the network to participate in consensus for each request: it’s enough to wait for a confirmation only from a subset of nodes responsible for a particular database.

Query consensus. The Client works with any node from the cluster to run queries. The node provides the response with verification from other nodes.

Access management

Modern applications are usually designed in a way that data uploaded by different users can be shared, merged, analyzed and obtained by a variety of ways. Enterprises need flexible permissions mechanism to manage granular data access, to grant or revoke permissions without excessive frictions. Via smart contracts and proxy re-encryption we can design such key management that is compliant with HIPAA, GDPR, HITECH, PCI and other regulations.

Data sharing workflow. The Client provides Sharing Contract with Re-Encryption Key to the cluster. Then,
Buyer should fund it before he can run queries. Client and nodes receive rewards according to Sharing Contract.

What about other solutions? Is Fluence unique? Let’s take a look at the market:

  • BigchainDB. This is a great attempt to combine blockchain with traditional databases to get immutability, scalability, and performance. However, BigchainDB is more focused on private intranets and doesn’t provide real decentralization and encryption. BigchainDB is suitable for organizations or associations who want to eliminate mediators from business processes but keep a centralized control.
  • Swarm, Filecoin, Storj, Sia. All these projects propose different ways to utilize world computers’ storage capacities. Storj, Sia are in fact “a decentralized Dropbox” with underlying p2p storage protocol. They solve lots of difficulties: data partitioning, retrievability, nodes motivation and content delivery. However, to analyze the data one has to download it first. Fluence, on the other hand is able to run queries on the miner nodes that doesn’t require expensive network data transfers.
  • Proprietary encrypted databases. There are many solutions for traditional databases that provide data regulations compliance and limited access management to enterprises. But usually, when a company needs to share data for any joint venture or move data between locations, it becomes too complicated and integratively expensive.

Final notes

Ubiquitous decentralization trend that we observe these days has risen due to economic advantages that the removal of a central party provides. Blockchain removes excessive manual verification work by employing the whole network to make a consensus. Storage and computational markets that are built with blockchains significantly reduce the storage price by putting unused resources back to work.

Fluence aims to create a market for structured data. Any data owner can store, share or monetize her data transparently with cryptographically guaranteed privacy. The immense scale gives the ability to aggregate and handle really big data, leading to new insights and opportunities for humanity.

Read more about the technical architecture in our Tech Whitepaper