Securing Decentralized Databases, Part I
Before the advent of the personal computer, there was an epoch of mainframes providing computational resources to multiple user-facing terminals. These terminals carried only the bare minimum of hardware required to connect to a mainframe and played a role similar to light clients in the modern blockchain space.
Once cheap consumer-grade hardware became available, a significant amount of computational power moved to the user side — media, games, office packages, scientific toolkits — all running on desktop machines. This is still true today, but the pendulum has swung back for a great many use cases, and a major share of computations have moved to the cloud. The browser now plays a role akin to those terminals of the past by connecting to the cloud mainframe — and as experience has shown, this model scales well to billions of users.
What is the cloud, essentially? The joke goes: “The cloud is just someone else’s computer,” but of course things are more complicated. The cloud is composed of a massive number of computers in one or more data centers, connected with a high-throughput network, and ready to provide computational resources and serve requests. This is known as IaaS — Infrastructure as a Service. Cloud providers take care of monitoring and fixing this infrastructure so developers can concentrate on building the software.
However, there is more to the cloud than just the infrastructure. On top of that, cloud providers deploy middleware services: databases, message queues, lambda functions, data processing engines. While these services do not directly interact with end users, they constitute a vital part of software architecture and are usually known as PaaS — Platform as a Service.
Quite often, cloud middleware is in fact an existing open source package wrapped into a managed service. For example, AWS offers RDS as a managed version of the popular databases MySQL and PostgreSQL, MSK as a managed version of the message queue Kafka, and ElastiCache as a managed version of Redis and Memcached.
If there are so many different open source libraries and frameworks, why do developers use managed cloud services instead? Because using them is so much easier than performing all the required manual DevOps work. A cloud provider takes care of provisioning hardware, arranging backups and replication, connecting monitoring and alert systems, and providing recovery services if something goes wrong. Additionally, managed services perfectly integrate with each other and allow one to quickly build a new application from simple blocks.
However, this ease of use comes at the cost of firmly committing a developer to a particular cloud provider. The more cloud services developers use, the more they are exposed to various vendor lock-in issues.
Enter Web 3; Web 3 protocols can reduce dependency on centralized intermediaries while keeping complexity low, so developers could have the same intuitively simple experience that they have when working with traditional cloud providers. Managed services, in this case, become separate protocols or protocol components. For example, an object storage service like AWS S3 can be replaced by Filecoin, Swarm, or Arweave, the key management service AWS KMS replaced by NuCypher, and the media streaming service AWS Elemental MediaConnect replaced by LivePeer.
So, what might a full-service decentralized cloud look like?
Decentralized Cloud Dreams
The decentralized cloud would run applications in a trustless network of nodes where the software is completely abstracted from the underlying hardware, and where the hardware would be run by virtually anyone with a sufficient amount of resources. Open-source packages such as Redis, MySQL, Kafka, or Spark could be ported to the decentralized cloud and serve developers in a way similar to how they work in the traditional environment.
There would be no need to reimplement the hundreds of open-source tools and libraries that were created before — they would just run on top of the decentralized infrastructure. Because these services would run in the open network, they will be able to integrate seamlessly with each other.
Finally, runtime issues would be monitored and fixed by a much wider community of developers than it is possible with private cloud deployments. The open-source community would be able to collaborate, not only on writing the code, but also on running it.
Why is this dream not yet a reality? To answer this question, we need to consider what constitutes the cloud at its base. While the cloud might be formed of a large number of different services and protocols, at its base you will find just a few key components: computing, data storage, and networking.
Decentralization of data storage seems to be an almost solved problem: Filecoin, Swarm, and Arweave can guarantee data availability and durability, and have the potential to become cheaper and more robust than traditional cloud storage. Merkle proofs would attest to the client that the data retrieved from decentralized storage is not forged.
Decentralization of computing is much harder, but there are a few different approaches that can be used to run arbitrary Turing-complete programs:
- Fat client (aka full node): The client has all the data required to run the program and performs the computation locally, which essentially makes the client a full node. The data itself can be downloaded from decentralized storage, or otherwise pulled from the rest of the network nodes.
Although simple, this approach does not always work in the current model where clients are often thin, servers carry most of the workload, and the network between clients and servers has limited throughput.
- Zero-knowledge proofs: A full node can execute the program locally and then provide the client with proof that the computation was performed correctly. Ideally, it should take very few resources for the client to verify the presented proof, and the overhead to prepare such proof on the full node side should be negligible.
Unfortunately, at present, computing a zero-knowledge proof might take a few orders of magnitude more resources than performing the computation itself. Consequently, it is quite hard to use this approach for large-scale computations.
- Thin client + Merkle proofs: For any pointer-based data structure it is possible to build a Merkle proof for the retrieval of a part of the data it stores. A client might download only fragments of the data it is interested in from the decentralized network, verify that the retrieval was correct, perform the computation, and upload results back to the network. This way, bandwidth consumption is kept low, and client resources are used carefully.
As an example, this is an approach used by OrbitDB — a serverless database that stores data in IPFS and uses CRDTs for synchronization between different clients. OrbitDB allows a developer to use rather sophisticated data structures: in addition to a key-value store, it supports mutable and immutable logs, counters, and even search indices.
However, it is not possible to use this approach for many use cases. For example, to process ad-hoc aggregate queries or join operations the client will essentially need to become a full node because such query types often require reading a large number of rows from one or more tables, which all need to be transferred to the client side.
- Thin client + blockchain: Finally, it is possible to run the computation in a blockchain, where multiple nodes run the same program and verify each other. The more nodes that run the same computation, the higher the security of the system. This is because the chance that the majority of the nodes are malicious drops as the number of nodes increases.
In Ethereum case, the number of nodes can reach into the thousands, which makes Ethereum a really secure but not too cost-efficient system compared to the cloud. Blockchain sharding can significantly reduce the number of nodes required to run the same computation, but is this reduction enough to reach cloud-grade cost efficiency?
We should also note that in the Thin client + Merkle proofs approach (as well as in the Fat client approach) parties that are willing to use computation results need to trust the party that has performed this computation. Essentially, the computation result comes from an oracle: Merkle proofs would not be able to guard against a malicious client that has downloaded 5 in one chunk and 3 in another chunk, and then uploaded 6 as the summation result back to decentralized storage. This means that only Zero-knowledge proofs and Thin client + blockchain approaches can provide adequate composability in a trustless environment.
A Decentralized Database Gap
Let’s come back down to earth and consider a particular but quite useful kind of cloud middleware — a managed database service. Besides being a great model that combines computing and storage, databases are among the most used cloud services — almost every application uses a SQL or NoSQL database.
It’s worth noting that we have not yet seen a decentralized version of AWS RDS (which provides managed MySQL and PostgreSQL) or a decentralized version of AWS Elasticache (which provides managed Redis and Memcached).
Why is this the case? The reason is that databases are not only about storage; they are also about computation. In addition to aggregate queries and join operations which were already mentioned in this post, databases can also run stored procedures and user scripts, perform bulk data updates, and apply built-in and user-defined functions.
Combined with the requirement to serve requests quickly and store large volumes of data, it becomes quite a challenging task to port an open-source database to a decentralized environment. The sharding approach seems to be the most practical, but before we finally start going down this road, let’s analyze how much redundancy a cloud database usually carries.
Redundancy in the Cloud Environment
First, cloud databases usually use multiple replicas to achieve high availability and, in certain cases, to increase read throughput. Often, one of these replicas is designated as the primary and serves all the read and write requests, while others serve as standbys: if the primary replica fails, one of the standbys replaces it. Additionally, while standbys don’t serve write requests, they can still respond to read-only queries, allowing them to parallelize some of the load.
What replication factors are typically used? It depends on the workload type, but for AWS RDS 2× replication (one primary and one standby replica) is often used for online transaction processing and might be increased for the mix of transaction processing and analytics queries. Deployments of Cassandra — a popular distributed NoSQL database — might carry 3× replication for standard and 5× replication for mission-critical data. In general, it seems that 2× to 5× redundancy is quite common.
Second, backups are often used in addition to database replication. In this case, a snapshot of the database state is periodically taken and uploaded to an external storage service such as AWS S3. If snapshots are taken daily, external storage might keep the last week’s worth of snapshots.
Here we should also note that external storage most likely has some redundancy as well. While there are no exact numbers on how many copies of each stored object AWS S3 holds, based on its 99.999999999% durability guarantees, we can speculate that normally at least three different devices should keep the same data.
Why have backups if the replication is already in place? One of the most critical reasons is that backups allow restoring the database state at a certain point in time. This might be useful, for example, if someone accidentally issues a DROP TABLE command, which is then repeated in all replicas holding this table. Such a command cannot be easily rolled back with the help of the undo log alone — but using the latest snapshot it is possible to recover at least some of the stored data.
To sum up, redundancy required for our hypothetical cloud database is a 2× to 5× replication factor for primary/standby replicas plus a few database snapshots stored externally with 3× replication. Can we repeat this with a sharded blockchain? To make things simpler, let’s forget about cross-shard transactions for now — many web applications use just a single database instance.
Sharding on a Budget
Assume that we have allocated a budget for just 5× to 10× redundancy. Can we call it a day by rolling out a shard with just 7 validators connected with some BFT consensus where each validator carries a database replica? Unfortunately, it is not that straightforward — such a shard would have minimal security guarantees.
If you are curious about details, the NEAR folks have laid out various sharding challenges in a pretty neat series of posts (1, 2). But in short, what can go wrong in our setup? Well, there are a few potential threats. Below we consider what guarantees can a single isolated shard provide to its clients.
Isolated shard security
Assume that a fraction of p nodes in the network are malicious. Many permissioned BFT consensus engines (such as Tendermint) can tolerate k malicious nodes in a shard composed of n = 3k + 1 nodes. Inverting this, we can figure that if n — k (that is, 2n/3) or more nodes in the shard are malicious, then they can completely take over this shard and “tolerate” honest nodes. Assuming that nodes in the shard are randomly sampled from the network, the probability of shard takeover is:
For our model shard with 7 nodes, it only takes 5 malicious nodes to seize it. Let’s assume that the fraction of malicious nodes in the network is 10 percent. By substituting these values into the formula above we obtain that the probability of shard takeover is approximately 0.017%. That’s probably too high even for casual applications that do not have stringent security requirements.
How do we guard against this scenario? In certain designs, if honest nodes have a ⅔ majority in the shard, they can slash the deposits of those nodes that signed an incorrect state transition. But in our case, malicious nodes can perform any state transition they want without talking to the honest nodes.
Furthermore, because malicious nodes are tightly connected, they can coordinate and misbehave only when they represent a ⅔ majority in the shard — basically, when there is no risk at all in acting maliciously. This means that without some external supervision thin clients might never learn that the shard is dishonest.
Securing Shards with the Shared Verifier Pool
To secure shards, whether they have just a few nodes or hundreds, two techniques are often used: validator rotation and fishermen nodes. At the core, these techniques are pretty similar: some node in the network verifies that the shard has performed only correct state transitions. If the verifying node finds an incorrect state transition, it can challenge this situation. In this case, if the verifying node is able to prove that the state was updated incorrectly, nodes that approved the incorrect transition lose their deposits.
One option is that the verifying node substitutes another node in the shard (in which case we say that the validator set has rotated) and replays the last several transactions made by that shard. Another option for the verifying node is to stay independent and fish for incorrect state transitions in different shards in the hopes of being rewarded. No matter which approach is used, the idea remains the same: a shard is secured not only by the nodes that belong to it but by the entire network.
Sounds great, right? But how is shard transaction history transferred to the newly joined validator node or fishermen? One issue is that if the number of nodes in the shard is small, history and state transfer might impose too much load on the shard and thus hinder the shard’s ability to process transactions in real time. Another issue is that malicious nodes in the captured shard might outright refuse to share incriminating evidence.
Decentralized Storage to the Rescue
One solution is to store the transaction history of the shard externally. Consider the following: whenever the shard processes a block, it can upload this block to a decentralized storage network and provide interested clients with proof that the block was indeed uploaded, and uploaded correctly. Once the block is uploaded, it is the responsibility of the decentralized storage network to serve the stored data when requested. Check out this Swarm paper to learn more about how a protocol can enforce this responsibility.
Now, fishermen can download different fragments of the transaction history from decentralized storage, repeat the computation, and challenge incorrect state transitions. As a side effect, fishermen would compute the new state, which they could then upload as a snapshot to decentralized storage. This snapshot can then be used as a starting point for further verifications or as a backup in case the entire shard fails.
Validator rotation can be implemented in a similar fashion: new validators will download the state and the required parts of the transaction history from decentralized storage without overloading the shard. Furthermore, even if the entire shard fails, it is possible to recover it using data from decentralized storage as a backup.
Note that because the fishermen in this design are disconnected from the shards, it is possible to make the fishermen selection completely random. This way, when the shard is processing transactions, it does not know if it will be verified by honest or malicious fishermen, which places significant pressure on the shard nodes to act honestly. We will also drop here a hint (covered in following posts) that it is possible to prevent fishermen from knowing if their verdicts will be re-verified.
Verification Game and Unanimous “Consensus”
Previously, we mentioned that a verifying node can prove an incorrect state transition. Is it possible to do so in a cost-efficient way? The answer is “yes”: one method is to use the TrueBit-style verification game: If two nodes do not agree on how a transaction should update the state, it should be possible to find a single virtual machine instruction’s processing where they disagree. Once such instruction is found, it is possible to ask an external, trusted judge (e.g., Ethereum) to repeat it to figure out the right way how this instruction should have been processed.
Note that in this case, only a tiny bit of the computation is processed on-chain, so the judge is not overloaded. Furthermore, the questioned instruction can be found by disputing nodes with the help of n-ary search — that is, without much computational overhead.
Note that the verification game mechanism requires just a single honest node to catch malicious behavior. In other words, shard nodes and fishermen must reach unanimous “consensus” on how state transitions should be performed.
Let’s find out the probability with which malicious behavior can go completely unnoticed. Assume that in addition to the shard consisting of n nodes, we also have m fishermen independently verifying this shard. Malicious behavior will not be detected only if the shard is taken over and all fishermen are malicious:
If we split our computational resource budget between 4 nodes in the shard and 3 fishermen nodes and keep our assumption that 10 percent of nodes in the network are malicious, then the probability to overlook bad behavior becomes 0.00037%. This probability is two orders of magnitude less than our previous naive model!
Malicious Behavior Economy
If shard nodes could not coordinate with fishermen nodes at all, this probability would have dropped even further. Imagine that a few malicious nodes were able to take over a shard. Let’s also assume that these nodes have staked a combined $1,000 on the network. Finally, let’s assume that shard nodes are rational and will perform incorrect state transitions only if they earn more than lose.
If three fishermen nodes later verify the shard, the probability that malicious nodes will be caught and lose their stake is 1–0.13 = 99.9%. Now, let’s calculate how much malicious shard nodes would need to earn before their behavior becomes economically profitable.
If we assume that they earn only when their behavior goes unnoticed and the transaction history of the shard is not reverted, then in 999 out of 1000 cases, they will lose their $1,000 stake and earn nothing. In 1 out of 1000 cases they will make $x, which should cover all the stakes they have lost. Solving the simple equation, we learn that malicious shard nodes need to earn at least $999,000 to break even. In other words, the transactional volume of the shard must be more than one million dollars for the attack to make sense!
This sounds too good to be true, and we agree: the economic model of malicious behavior in our hybrid system is much more complicated. Unfortunately, while shard nodes do not know fishermen beforehand and thus cannot initiate an early conversation, some not-so-honest fishermen might reach out to shard nodes if they spot incorrect behavior. Then these fishermen can demand a fee from those shard nodes to keep their silence. We will consider the economic details of this and other attacks in further posts.
Putting it All Together
Fluence Labs is working on its part of bringing the decentralized cloud closer to reality by combining a sharded blockchain approach with decentralized storage networks, trusted judges, and the classic client-server model to deploy a decentralized database network that will be as cost-efficient as the traditional cloud.
Fluence is a hybrid solution, different aspects of which we have touched on earlier in this post. The Fluence network is split into tiny shards, each of which is responsible for an independent database. These shards upload transactions to decentralized storage, and then the shared pool of fishermen verifies the uploaded transaction history. This verification is performed asynchronously of the transaction processing, which allows request serving latencies to stay low and the user experience to be smooth.
Shard composition in the Fluence network does not change too often, which allows avoiding excessive data transfers when new validators join the shard. Fishermen, on the contrary, are rotated continuously, which increases overall network security. If any node does not agree with how a transaction should be processed, it initiates a dispute on the Ethereum blockchain, which is resolved through the verification game mechanism.
WebAssembly is used as the target virtual machine which, combined with cost efficiency, allows porting many well-written open-source databases to the Fluence network. In fact, we have just recently ported Redis and now are looking forward to porting a production-grade SQL database (there is already a toy one — LlamaDB) such as SQLite.
We should also note that delayed verification in our architecture allows Fluence to have effective and secure cross-database transactions, but that’s a topic for the next post, along with some specific details about the Fluence architecture.
If you are eager for details, feel free to take a look at the Fluence paper describing how different components of the protocol such as Ethereum, Tendermint, Filecoin/Swarm, and WebAssembly VM are integrated together.
We would like to say a huge thank you to Howard Wu for the feedback!