The Data Availability Problem

Published in

SKALE

8 min readApr 11, 2019

In 2017, everything was looking up for Ethereum — ETH was mooning, people were flocking to build (this was before buidl) new dApps, and enterprises were getting involved. But the success would prove too great for Ethereum to handle…

Yeah… that transaction is going to take a few days at that gas price.

Cryptokitties had become the first majorly successful dApp. But it (and others) were hogging all of Ethereum’s resources — causing the chain’s mempool to grow at an unprecedented rate.

Suddenly, people were talking more and more about Plasma and State Channels — the solutions to our scaling woes. But despite all of the talk and excitement, development moved at a glacial pace and some members of the community started having their doubts as to the feasibility of these solutions. But now, as things begin coming online, people are seeing what has kept implementers at bay for so long — this is, of course, the illustrious data availability problem.

In this article, we’ll be giving a bit of background on the data availability problem and how it is being addressed by various layer two solutions such as Plasma, State Channels, and Elastic Sidechains.

The Data Availability Problem

As Vitalik has previously explained it, the data availability problem is where a malicious miner publishes a block where the blockheader is present but some or all of the block data is missing. This can:

Convince network to accept invalid blocks and there’s no way to prove invalidity.
Prevent nodes from learning the state.
Prevent nodes from being able to create blocks or transactions because they lack the information to construct the proofs.

But data availability doesn’t just relate to the withholding of block data. Generally speaking, it’s where any data is withheld from other participants in the network, a.k.a. censorship. Which isn’t a problem on the mainnet (as far as we’re aware), but it has come at great cost. In fact, the past 18 months, we’ve seen a 6.5x increase in the amount of state stored on each Ethereum node (GETH w/ FAST Sync).

And clearly, this is unsustainable for a truly decentralized network. As the size of the chain continues to grow, the number of computers eligible to participate as nodes in the network declines. So, how do we combat this rapidly growing chain?

Simple! Start and end things on-chain, but have clients manage everything amongst themselves in-between! This, effectively, is a very basic idea of what all Execution Layer / Layer 2 scaling solutions do. We go from everything being on-chain to the chain only serving as a settlement layer for off-chain interactions. But this can be problematic as clients participating in a layer two solution are required to maintain all of these off-chain transactions relating to them or be at the mercy of whomever does.

Example

Imagine you go to a casino to play poker. When you walk in, you go to a counter and exchange dollars for poker chips (think of this as an on-chain transaction). You then proceed to go to a table and play poker for a few hours (think of these as off-chain transactions) — sometimes you’re up, other times you’re down. After winning a huge hand, you tell the table you’re going to go cash out.

But as you get up from the table, someone hits you over the back of the head with a crowbar — when you wake up your memory is a bit fuzzy and you don’t remember the details of the poker game (this is data unavailability). While you were out, the people at the table decided to pretend like the last hand didn’t happen and continued playing from where they had been prior to the hand — cheating you of money that you had won.

If this had happened on the blockchain, this sort of cheating wouldn’t be possible as the world would be aware of what did and didn’t happen. But because it was all off-chain and you lost your transaction history, you have to accept the history that your peers tell you.

In Practice

In Plasma, every participant must keep the entire transaction history as well as enough witness data to prove whether or not their cryptoassets were transacted with in every Plasma block. This effectively makes every participant a node on that Plasma instance who is only storing data for their transactions. The reasoning for this history requirement is the fact that anyone in the Plasma instance could collude with the chain’s operator to submit invalid transactions to attempt the stealing of funds from other participants. And the only way for participants to prevent this from happening is to ensure that they have a complete and valid transaction history for all of their assets.

State channels have a lesser data requirement due to the fact that all parties are agreeing upon the current state instead of a state update (ex: transaction). This allows a contract to be settled with one transaction versus needing to replay any transaction history. And because each state has an auto-incrementing nonce and is not regarded by the smart contract as valid unless it has signatures from both parties, participants only need to store the latest state.

Note: Participants in a state channel may wish to store history state, as well, to settle with an earlier and more advantageous state in the case that their counterparty loses their state history.

Solutions

Now, teams are doing everything in their power to reduce the footprint of the data needing to be maintained by clients or submitted to the mainnet through things like ZK-SNARKS or RSA Accumulators, which are great improvements, but they don’t address the problem of data availability. In fact, we can’t really address this problem for a single client because it would require that that client be online 100% of the time and never lose data stored on it (sounds a lot like a blockchain, right?).

But given that these machines don’t really exist, the general consensus for addressing data availability is through incentivized watchtower networks (ex: PISA) or similar constructs. These incentivized networks are effectively an array of staked watchtowers who back up data for the users who are paying rent for their service and will dispute any challenges on behalf of users in the event that those users are unable to do so. If they fail to dispute a challenge within a certain time-period, they will lose their stake and it will be awarded a new watchtower in the network assigned to dispute the challenge (assuming it does). This failure / appointment protocol is multiple layers deep, so users have some assurance that they will not be cheated in the case that they fall offline or lose their transaction / state history.

And the reason why these solutions have taken so long to arise pertains to the fact that members of the community used to scoff at the idea of trusting some third party and wanted to try to make something which addressed data availability without needing to. But as the impossibility of this grew more and more evident, cryptoeconomic models (such as the one described above) have risen to mitigate the need to trust these parties.

SKALE’s approach

SKALE’s Elastic Sidechains address the data availability problem with their block proposal process. Once a validator has created a block proposal, it will communicate it to other validators using the data availability protocol described below. This protocol guarantees that the block proposal is transferred to a supermajority (>⅔) of validators.

The five-step protocol is described below:

The sending validator A sends both the block proposal and the hashes of the transactions which compose the proposal P to all of its peers.
Upon receipt, each peer will reconstruct P from hashes by matching hashes to transactions in its pending queue. For transactions not found in the pending queue, the peer will send a request to the sending validator A. The sending validator A will then send the bodies of these transactions to the receiving validator, allowing for the peer to reconstruct the block proposal and add the proposal to its proposal storage database PD.
The peer then sends a receipt to back to A that contains a threshold signature share for P.
Upon collection of a supermajority (>⅔) of signature shares from peers (including itself), A will create a supermajority signature S. This signature serves as a receipt that a supermajority of validators are in possession of P.
A will then broadcast this supermajority signature S to each of the other validators in the network.

Note: Each validator is in possession of BLS private key share PKS[I]. Initial generation of key shares is performed using Joint-Feldman Distributed Key Generation (DKG) algorithm which occurs at the creation of the Elastic Sidechain and whenever validators are shuffled. Check out our article on BLS and DKG to learn more!

In further consensus steps, a data availability receipt is required by all validators voting for proposal P whereby they must include supermajority signature S in their vote, honest validators will ignore all votes that do not include the supermajority signature S. So, assuming a supermajority of honest validators, this protocol guarantees data availability, meaning that any proposal P which wins consensus will be available to any honest validators.

Summary

So, if you were wondering what developers implementing Execution Layer solutions were occupied with for the past 18+ months, it’s likely that a large portion of their time was initially spent addressing this exact problem. And while there is no perfect solution to this for all scaling solutions (yet), there’s a lot of new and exciting work being done and we are excited to see what the future brings!

Learn More

If you are interested in trying SKALE out, make sure to join the SKALE community on Discord and check out the Developer Documentation! Also, feel free to also check out SKALE’s Technical Overview and Consensus Overview for a deeper dive into how SKALE works and is able to provide 20,000 TPS.

SKALE’s mission is to make it quick and easy to set up a cost-effective, high-performance Elastic Sidechains that run full-state smart contracts. We aim to deliver a performant experience to developers that offers speed and functionality without giving up security or decentralization. Follow us here on Telegram, Twitter, Discord, and sign up for updates via this form on the SKALE website.

The Data Availability Problem

The Data Availability Problem

Example

In Practice

Solutions

SKALE’s approach

Summary

Learn More

Written by Artem Payvin