Ethereum lays the foundation to scale data.

An Introduction to Ethereum’s Next Major Upgrade: Proto-Danksharding

Aaron Hay
Coinmonks
Published in
12 min readJan 18, 2024

--

The following is a summary of a broader research paper. The full whitepaper linked here overviews current state Ethereum — layer 1 & 2 — to help the reader build a foundation prior to diving into the details of Ethereum’s scaling plans.

In Brief

  • Ethereum Improvement Proposal (EIP) 4844, also known as Proto-Danksharding, is the next major upgrade for Ethereum. The upgrade is centered around creating more space for data to be made available on Ethereum’s Layer 1 (L1), for rollups. The main ingredient is blobs. Blobs come with a new transaction type: blob-carrying transactions, and a new gas type: blob gas.
  • Today, rollups use calldata to post data to the L1. Despite being the cheapest way to store data on Ethereum, calldata is expensive. 1 byte of data requires up to 16 units of gas, and gas prices frequently spike with general demand to use the L1. Calldata is stored by all nodes, forever, even if data only needs to be made available for a brief time. It is not surprising that calldata is neither an efficient nor cheap way to make a lot of data available — calldata was not designed for this purpose — but blobs are.
  • Blobs aim to be the cheapest way to store data on Ethereum. Blobs require 1 gas unit per byte, and blob gas pricing is decoupled from general L1 blockspace gas. Blobs store data temporarily on consensus nodes, only as long as is necessary. With blobs, rollups have a cheaper, more efficient way to post data to Ethereum. Likely to result in lower transaction fees for rollup users.
  • Proto-Danksharding is the first key step on the journey to Danksharding — which is a phased approach to create more space for data and introduce a way for nodes to verify data is available — to help scale Ethereum to 100,000+ transactions per second, on rollups.

Ethereum’s next major upgrade is a scaling focused upgrade that goes by the name Proto-Danksharding (PDS), named after a few of the key researchers behind the idea. The EIP number for this upgrade is 4844, and it will be included alongside 8 other EIPs in the upcoming Cancun-Deneb hard fork (the Dencun upgrade).

Ethereum seeks to scale to more than 100,000 transactions per second (TPS), while maintaining decentralization. This is no simple task! The potential solution: scaling in a rollup-centric way. Rollups are off-chain execution environments that use Layer 1 for data availability and consensus. While Layer 2 (L2) rollups will specialize in computation and compression, the L1 will specialize in consensus and data availability (DA).

  • Rollup goals: Compress transaction data to its smallest form and post that data to the L1 in the cheapest way possible.
  • L1 goals: Create a cheap way for rollups to post data and a lightweight way for all nodes to verify the posted data.

Current limitations need to be addressed to reach these goals.

  • The existing resource: Rollups currently use calldata, which is the cheapest way to store data on the L1 today. However, calldata is forever stored by nodes, and requires 4 or 16 units of gas per byte — for zero and non-zero bytes, respectively.
  • The existing market: Rollups compete for blockspace with all other transactions. The gas is the same, and the fee market is the same. Rollups consume ~10-15% of blockspace, and ~90% of rollup transactions cost is used to pay for L1 blockspace.

PDS introduces a new transaction, a “blob-carrying transaction,” with a new resource and market, specialized for data.

  • The new resource: Meet blobs. Blob stands for Binary Large OBject — think: a lot of 0s and 1s (bits) — over 1 million bits total. These bits are counted in batches of 8 (bytes). Blobs contain 131,072 bytes (128 KB) of data. 4096 clusters of 32 bytes each. Each blob must be exactly 128 KB. Blob-carrying transactions will carry a pointer hash to the blob with few operations for execution clients. The blob storage is temporary, and entirely the job of consensus clients. Consensus clients will store blobs for 4096 Epochs (6.4 minutes each), or ~18.2 days. We will think of this as blobspace, a temporary resource that either grows or shrinks each block by between -6 and +6 blobs. Each block will target 3 blobs (384 KB) with a limit of 6 blobs (768 KB). Blobs will require nodes to store ~48 GB of additional data.
  • The new market: Blob gas will regulate blobspace. Blobspace is temporary and separate of the Ethereum Virtual Machine’s (EVM) execution clients. Blobs will consume ~94% less gas units (1 byte of data = 1 unit of data gas) than calldata (1 byte of data = up to 16 units of regular gas). The price for this new data gas is self-adjusting and independent of regular gas. With its defined minimum price of 1 wei (10^-18 ETH), blob gas will be (almost!) free, at first, but it will quickly adjust upward, exponentially, if blobspace is continually more than 50% utilized (>3 blobs per block).¹
Rollups post blobs of compressed data to consensus nodes, rather than using transaction calldata. In turn, blobs require far less blockspace, and data gas is independent of any spikes in regular gas pricing. Blobs create blobspace, with a self-adjusting independent data gas price to regulate blobspace demand. Rollups may not be the only consumer of blobspace, but they will likely be the largest, most consistent consumer over time.

With this upgrade there is still a lot we do not know. We do not know the long run behavior of rollups. Will the endgame be one rollup or many? Will rollups share blobspace, or implement a shared sequencer? What will happen to maximum extractable value (MEV) long term? Will rollups be based? Regardless, PDS brings a new way to store data that is more efficient than the current status quo (calldata).

While PDS is an imminent, stop-gap solution for Ethereum scaling, the preliminary theory is also in place for the longer-term scaling solution: Danksharding (DS). In PDS, consensus nodes are responsible for storing the full blob data, but with DS, consensus nodes will only store a portion of blob data. DS asks nodes to sample the blob data at enough points to confirm that the data is available, a concept known as data availability sampling (DAS).

The lifecycle of a blob includes the blob-sending client, a powerful block builder, the protocol selected proposer, sampling nodes, and a blob reconstruction agent. It is important to note that blobs are structured as polynomials, and polynomials introduce a few fancy math tricks that are used across a blob’s lifecycle. As we step through the DS lifecycle, we will emphasize these few tricks.

To start, clients submit blob-carrying transactions to builders. The blob must be 4096 pieces of 32 bytes each, so the client will pad the data with 0s if it does not have enough data initially. Each piece is treated as a number, and those numbers are used to compute a polynomial equation, P(x). Clients evaluate P(x) at a secret point, P(s), to form the commitment of the blob, which functions much like a hash of the blob. SHA256 is used to hash P(s) into the versioned commitment so that it is 32 bytes long and readable to the EVM. This is the first polynomial trick.

Math Trick #1 The commitment, Ci, formed by the client, uses KZG, a polynomial commitment scheme named after its founders Kate, Zaverucha and Goldberg. The KZG scheme was selected as its commitments and proofs are a constant size, among other helpful properties. KZG requires a trusted setup, like other zero-knowledge (ZK) technologies, and the ceremony has a 1 of N trust assumption such that only 1 of the more than 140 thousand participants needs to be honest for the whole operation to be trustworthy. The client functions as the prover of the proof, and sampling nodes will function as the verifier. The valid proof is inextricably tied to the polynomial of blob data, and once the commitment is made, the underlying polynomial data cannot be changed. The verifier, or randomly selected nodes, use the proof to confirm that the data they sampled is indeed tied to the polynomial they wish to sample, and not some other data set.²

Next, block builders ingest blob data and commitments from blob-sending clients and perform the second polynomial-enabled trick.

Math Trick #2 Builders use a polynomial trick called data erasure coding to extend the blob data to double its original size, using redundant data. This trick makes it true that if at least half of the data is available, then all the data is available, which is helpful for sampling. Data erasure coding, or Reed-Solomon erasure coding, has been around since the 1960s, and is used here to protect against issues retrieving data samples for up to half of validators. If half of validators are honest and provide their portion of the data to the reconstructing agent, the full data can be retrieved using a technique called Lagrange interpolation. Note that this trick can be applied multiple times, expanding the data both horizontally and vertically, to prepare the data for eventual two-dimensional (2D) sampling techniques in the later phases of DS.

The builder accepts all blobs by which Ci is a KZG commitment to the polynomial, rejecting all data blobs with an invalid Ci. The builder is then tasked with distributing blob data, and commitments, across validators. At the time of writing, Ethereum has over 900 thousand validators. With DS, any validator may eventually be tasked to participate as a verifier in data availability sampling.

DS allows for both sampling and full data reconstruction.

Sampling can be conducted by clients to confirm that data is available. This sampling game is surprisingly minimal, due to the properties afforded by trick #2 — sampling as few as 30 samples gives nearly 100% confidence the data is available. Sampling may be done by (i) full nodes, (ii) light clients, or (iii) validators. (i) and (ii) may be rollup participants, confirming data availability for themselves. (iii) is required for the network to come to consensus.

For Ethereum’s proof of stake consensus, there are 32 slots per epoch, each slot is 12 seconds, and each epoch is 6.4 minutes. For each slot, there are up to 64 committees each of at least 128 attesting validators. To attest to the next canonical block via the fork-choice rule, validators will conduct sampling on the latest block and each of the epoch’s previous blocks to ensure that blobs of data in these blocks are valid and available. Here, with DS, the protocol only ensures the data is available for 32 slots, or 6.4 minutes. Far less time than PDS, which makes data available for 4096 epochs, or ~18 days.

During a blob’s short stay on the L1, full reconstruction can occur. This can be done using a simple greedy reconstruction algorithm that works in an iterative fashion to find data that is 50% complete and uses interpolation to construct the remaining 50%. Repeating until the whole blob of data is fully reconstructed and can be stored outside of the L1. Effectively, in these 6 minutes the L1 passes the baton to various third-party reconstruction agents tasked to store the sizable blob data off chain, indefinitely.³

In DS, because the nodes are not required to download all the data, but only a portion, blobs can potentially be much larger in size, and many more blobs can be allowed in any given block. DS targets up to 128 blobs (~16 MB) per block. Initially, however, DS may enable 32 blobs per block — a 10x increase in the target number of blobs per block specified with PDS.⁴ ⁵

DS may eventually introduce many, many more blobs… ~42x increase over the defined target and limit in PDS. DS changes are focused on the consensus clients and DS’s data availability sampling works to no longer require consensus clients to store all blob data. Instead, nodes participate in sampling checks to confirm their portion of data is available and nodes provide their sampled data to peer nodes when asked.

The ask of builders in the DS architecture is high. A lot of data needs to be processed and propagated in a short time span — requiring a lot of compute power, and a low-latency connection. Proposer-builder separation (PBS) is a concept that exists already out-of-protocol with Flashbot’s MEV-Boost, but in-protocol PBS is required for the protocol to enforce this separation. PBS shields proposers from having to have a builder’s specialized skillset, but with PBS, proposers still capture the vast majority of searcher-builder-curated MEV through a competitive block building auction each slot. DS asks builders to process up to 32 MB of data in each slot, which is why it is key for this separation to exist, to isolate MEV’s centralizing force to these specialized block building activities.

PDS moves the Ethereum L1 closer to DS because it is designed in a forwards-compatible way. Following PDS, all the updates for DS are consensus-client changes, and these remaining changes do not require any additional work from execution client teams, users, or rollup developers.⁶ Due to its complexity, perquisites, and open research areas, DS will likely be implemented in phases over the next 2–3 years — incrementally introducing more sophisticated sampling techniques and higher blob limits per block.

  • Phase 0 (PDS): PDS. Nodes download full blob data. Capped at 6 blobs / block. New fee market just for data. No DAS. To be live soon (tm).
  • Phase 1 (PeerDAS): One-dimensional (1D) sampling. The prevailing approach for this intermediary step towards DS is PeerDAS. With the proposed PeerDAS architecture, blob data is only extended horizontally, and blob distribution is sharded (i.e., nodes only store a portion of blobs). Networking between nodes is introduced to facilitate sampling communication protocols. The blob cap increases gradually, e.g., 3 → 32 → 64. For more on PeerDAS refer to the research post here.⁷
  • Phase 2 (DS): Full DS is realized in this phase. 2D sampling is introduced, with blobs extended vertically and horizontally. The blob cap may increase incrementally, yet again, e.g., 64 → 128 → 256.⁸
Although there is enough space for data on the L1 for L2 rollups to process thousands of TPS today — the space (calldata) is competitive and costly, and rollups are only processing 40–50 TPS on average. Many factors will continue to influence rollup demand (trust, cost, speed). Rollups are still in their early stages, with many risks presented to users. Further, the average fee / transaction on rollups is not yet less than 1 cent — currently fluctuating between 15 and 30 cents. PDS and DS not only stand to increase data supply for rollups but may also boost rollup demand with cheaper transactions achieved through a decoupled L1 data market.

With PDS, and eventually DS, Ethereum is striving for scalability (in a rollup-centric way) that relies on both supply and demand.

  • Supply: Can the network technically handle high amounts of TPS on rollups? Does the L1 have the data space required?
  • Demand: Are rollup transactions cheap enough to generate enough demand to capture high amounts of actual TPS? Do users trust the rollup infrastructure to transact frequently?

While PDS introduces blobs as a new resource to boost data availability (more supply for rollups’ transaction data), more notable initially is the independent fee market for blobs (which may serve to reduce costs to rollups users and increase demand). With PDS, rollups will no longer need to be a primary consumer of blockspace for data since they will have their own tailor-made resource, blobspace. In the later phases of DS, the L1 plans to have ample space for rollup data. Perhaps, with DS, rollups transaction costs can approach low levels (e.g., less than 1 cent) — acceptable to most users — to boost demand to transact on chain.

In Conclusion

  • At the time of writing, EIP 4844 / Proto-Danksharding is fast approach with the upcoming Dencun upgrade. Dencun testnet upgrades are happening now through Early February, and the mainnet upgrade may occur as soon as mid-to-late February.
  • In the months following PDS, we will be able to interpret with on chain data (L1 & L2) the impact of the introduction of more, independently-priced data availability space on the L1.
  • In the longer term, with the roll out of DS over multiple phases, Ethereum’s base layer data availability may scale up to 16 MB per block, and this development, coupled with rollups having more sophisticated data compression capabilities, should result in an Ethereum base layer that is cheap and readily available for various rollup chains’ data availability needs.
  • The primary beneficiary in the long term is the transacting rollup user, which stands see lower transaction costs and higher TPS capabilities on their rollup chain of choice; however, as Ethereum’s scaling plans develop, there are many risks associated with rollups that these users must monitor along the way.

Acknowledgements

Thank you to Jan Roessler & Zach Alam for contributions and to Chen Zur, Arwin Holmes, Kartheek Solipuram, Patrick Ambrus, and Davide Crapis for feedback and discussions.

Disclosures

The information provided here is for general informational purposes only and should not be considered as financial, investment, or tax advice.

The thoughts and opinions expressed in this whitepaper are mine alone and do not represent those of my employer. Any opinions presented here may change without being revised.

Technical topics are discussed at a high level, on which I am not an expert. For a more detailed technical breakdown of certain covered subjects, please refer to the provided source material.

References

¹ Vitalik Buterin, Dankrad Feist, Diederik Loerakker, Others. “EIP-4844: Shard Blob Transactions.” 25 February 2022, eips.ethereum.org/EIPS/eip-4844

² Dankrad Feist. “KZG polynomial commitments.” Dankrad Feist’s Ethereum Blog, 16 June 2020, dankradfeist.de/ethereum/2020/06/16/kate-polynomial-commitments.html

³ Valeria Nikolaenko, Dan Boneh. “ Data availability sampling and danksharding: An overview and a proposal for improvements.” a16zcrypto, 16 April 2023, a16zcrypto.com/posts/article/an-overview-of-danksharding-and-a-proposal-for-improvement-of-das/

⁴ Feist, Dankrad. “ New sharding design with tight beacon and shard block integration.” HackMD, notes.ethereum.org/@dankrad/new_sharding

⁵ “AMA: We are EF Research Pt. 11–10 January 2024.” Reddit, 10 January 2024, https://www.reddit.com/r/ethereum/comments/191kke6/ama_we_are_ef_research_pt_11_10_january_2024/

⁶ Vitalik Buterin. “Proto_Danksharding_FAQ.” Ethereum Notes, notes.ethereum.org/@vbuterin/proto_danksharding_faq#What-is-Danksharding.

⁷ Danny Ryan. “Peerdas: A Simpler DAS Approach Using Battle-Tested P2P Components.” September 2023. https://ethresear.ch/t/peerdas-a-simpler-das-approach-using-battle-tested-p2p-components/16541

⁸ Francesco. “From 4844 to Danksharding: A Path to Scaling Ethereum DA.” December 2023. https://ethresear.ch/t/from-4844-to-danksharding-a-path-to-scaling-ethereum-da/18046

--

--

Aaron Hay
Coinmonks

Interested in how public blockchains, tokens, and code enable a ground up reinvention of traditional financial stacks that are more open and accessible to users