Combining KZG and erasure coding
The Hitchhiker’s Guide to Subspace — Episode II
Special thanks to our Consensus Lead, Rahul Subramaniyam, for adding detailed notes on existing constructions and our Senior Research Consultant, Dr. Chen Feng for proofreading.
The first episode in our Hitchhiker’s Guide to Subspace series discusses polynomial PoRs for Subspace v2 and the combination with KZG commitments. It overviews the trade-offs for different forms of Reed-Solomon (RS) erasure codes and Kate-Zaverucha-Goldberg (KZG) commitments, including their benefits and challenges. The article also introduces KZG polynomial commitments, proof batching schemes for KZG, and their use cases in distributed data storage. I strongly recommend reading the first episode for context.
It is crucial for Subspace Archiving to leverage KZG commitment and erasure coding to guarantee that the history pieces were erasure coded correctly and are recoverable on request.
In this article, I outline our lessons learned from combining RS and KZG and provide an overview of existing schemes, some described in research papers and others currently used by other teams. The insights in the first half of this article will come in handy for understanding the constructions in the second half.
Towards Subspace v2 Archiving
In this section, I explain how we apply the ideas (from the schemes described) to Subspace Archiving, gradually building up complexity. The following consists of a high-level step-by-step description of each scheme and an analysis of the pros and cons.
Archiving is conceptually a protocol for transforming blockchain history into Archived History using some algorithm defined for all peers.
Specifically, in Subspace v1, archiving meant a process of transforming blocks at a configured depth from the tip of the chain as follows:
- Blocks are SCALE-encoded and added to a buffer.
- Once the buffer reaches the size of a full segment, it is sliced into records.
- Records are erasure-coded.
- Merkle Tree is built over both source and parity records, and a witness is computed for each record.
- Records are concatenated with corresponding witnesses to form pieces of Archived History that farmers later use for plotting and farming.
For Subspace v2, we replace the Merkle commitment in step 4 with KZG commitments while also adding the ability to prove step 3 was done correctly.
One inherent challenge of using the KZG commitment scheme for Subspace Archiving is the incommensurability of the underlying data measurement units. Subspace Archiving, Plotting, Farming, and the Distributed Storage Network are designed around the notion that a piece is a unit of measurement. KZG, on the other hand, operates over field elements (see Episode I for a detailed explanation). In practice, KZG commits to chunks up to 254 bits (or 31 bytes) in size, while the pieces in Subspace are 1 MiB. Therefore every construction has to account for this size disparity. Below, I present several constructions that address this problem differently.
Tentative 1: Flat commit-only
The most straightforward way to use the KZG commitment scheme would be to replace all usage of Merkle tree as follows:
- Collect n pieces into a segment.
- Erasure code them with a rate of 1/2 and obtain 2n pieces.
- Hash each piece to chunk size.
- Commit to piece hashes under the KZG scheme and obtain segment commitment C.
- For each piece hash, compute a proof π with respect to segment commitment C.
- Concatenate each piece with its proof π and add to Archived History.
This scheme allows us to prove that any piece hash is a segment member. We can precompute proof for each piece once and instantly provide it upon request. However, this scheme does not use any other properties of KZG commitments and does not leverage its connection with erasure coding. In this case, the erasure coding is done independently and would need a separate mechanism to prove it was done correctly (i.e., fraud proofs). Thus, we need more than this scheme offers to satisfy our requirements.
Tentative 2: Flat encode-commit
We want to guarantee the correctness of erasure coding without employing a different tool.
To achieve this, we recall two facts from Episode I:
- Reed-Solomon interpolates a polynomial over the data to erasure code it
- KZG is a polynomial commitment scheme
For this scheme, we combine these two facts and commit to the same polynomial that was interpolated for erasure coding:
- Collect n pieces into a segment.
- Split the segment into chunks.
- Erasure code them with rate 1/2 by interpolating a polynomial of degree (n — 1) over source data chunks and evaluating it at twice as many points to obtain twice as many chunks.
- Commit to that polynomial and obtain the segment commitment C.
- Join the chunks back together to form 2n pieces.
After the computation of the five steps above, we can guarantee that the source and extended data lie on the same low-degree polynomial. However, in this simple scheme, we can only directly prove the inclusion of a 31-byte chunk into a history segment. In this case, we would have to provide a witness for each chunk in all the pieces to prove that a specific piece belongs to the Archived History. This approach would incur an overhead of ~150% for each piece, which is unacceptable. To circumvent this issue, we use well-known proof aggregation techniques to reduce the overhead to a single 48-byte proof for a whole piece. The scheme is then extended with two more steps:
6. For each piece, compute a subvector batch proof πᵢ of all its chunks belonging to segment commitment C.
7. Concatenate each piece with its proof π and add to the Archived History.
This scheme allows us to prove any small chunk is a segment member. To prove a piece is a segment member, we recall that a piece is a group of smaller chunks, so we can aggregate chunk proofs into one proof for each piece and prove a piece is included in the segment.
In this scheme, we also commit to the same polynomial used for erasure coding, so we can demonstrate that the erasure coding was done correctly and the segment can be recovered from any subset of n pieces.
An issue with this scheme is performance — if the pieces are too large or there are too many pieces, the interpolated polynomial will be of a large degree and become too slow to commit to and compute witnesses, and consume too much memory.
Tentative 3: Stacked encode-commit
One way to mitigate a flat scheme’s size and complexity growth is to stack data into a matrix instead of simply concatenating records. This approach is similar to the one mentioned in Succinct Erasure Coding Proof Systems (ECP), which I will describe further in detail in this article.
- Collect n pieces into a segment.
- Split each piece into chunks.
- Arrange the source data into a d×n matrix, where d is the number of rows equal to the number of chunks in each piece, and n is the number of columns equal to the number of source pieces.
- Interpolate a polynomial over the n chunks of each row, erasure coding it to 2n chunks.
- For each row, compute a commitment Cᵢ over the same polynomial.
- For each column (piece), compute a same-point cross-commitment batch proof πₖ of each chunk belonging to the corresponding row commitment.
- Concatenate each piece with its batch proof πₖ and add to the Archived History.
- Save all row commitments Cᵢ for i in [0, …, d — 1]
This scheme makes the degree of each polynomial we commit much smaller and more manageable.
In this scheme, we also commit to the same polynomial used for erasure coding, so we can demonstrate that the erasure coding was done correctly and the segment can be recovered from any subset of n pieces. The drawback is that now we have to store many (equal to the number of chunks in a piece) commitments for each segment. Furthermore, the batch proofs for each piece also require a verifier to access all those commitments to prove a piece belongs to the blockchain history.
Tentative 4: Stacked with crosswise encode-commit
Another approach is to make encoding and commitment orthogonal:
- Collect n pieces into a segment.
- Split each piece into chunks.
- Arrange the source data into an n×d matrix, where n is the number of rows, equal to the number of source pieces, and d is the number of columns, equal to the number of chunks in each piece.
- Interpolate a polynomial over the n chunks of each column, erasure coding it to 2n chunks.
- For each row, compute a commitment Cᵢ for i in [0, …, n— 1].
This scheme does not directly provide a way to prove piece inclusion into the blockchain history but leaves the liberty to choose. For instance, we can employ a 2-step proof that shows a piece with a particular piece commitment belongs to a segment with a particular root commitment using a Verkle tree.
Note that in this scheme, we do not commit to the same polynomials used for erasure coding. However, it allows us to demonstrate that the erasure coding was done correctly using the homomorphic property of KZG commitments: commitments to erasure-coded data should equal erasure-coded commitments.
Existing Constructions
Recently, there has been a significant increase in academic and industry interest in studying erasure coding and pairing it with KZG. Combining these two concepts has led to the discovery of fascinating properties, including those we describe in this and the previous episode of the Hitchhiker’s Guide to Subspace. In addition, many scholars and experts have been exploring these properties and their practical applications that can benefit various industries and businesses. As a result, multiple approaches to combine these two primitives have emerged.
In this section, I will overview several existing constructions. The following list should be considered merely a partial list of existing literature and development that are particularly interesting to us at Subspace. The four constructions we’ve identified are:
All four perform the same steps at a high level: each construction arranges the data in a matrix, computes a series of vector commitments to the data, and extends the data with a series of erasure codes. However, there are fundamental differences in how each construction approaches the combination of these steps.
We can broadly classify them by “dimensionality” of whether encoding and commitment happen in the same “direction” as follows:
1D: ECP has only one direction for both encoding and commitment
1.5D: Polygon Avail and Semi-AVID-PR both encode in one direction and commit in the other
2D: The Ethereum scheme encodes in two directions and commits in one
The descriptions below largely follow their sources’ notations to make it easier to follow for the readers; however, I unified the language across the different examples.
Erasure Coding proof system
In their paper, Alhaddad et al. define and construct a general-purpose erasure coding proof (ECP) system to tackle the issue of guaranteeing that erasure-coded data fragments correspond to the original data in distributed applications such as asynchronous verifiable information dispersal (AVID), reliable broadcast (RBC), multi-valued Byzantine agreement (MVBA), and atomic broadcast. The construction works for any additively homomorphic polynomial commitment and linear erasure code.
As an example, they use an AVID protocol with a systematic (m,n)-RS code and KZG commitment as follows:
- A single dealer arranges the source data into k rows and m columns matrix.
- The m chunks of each row are represented as a polynomial pᵢ(x) (in evaluation form) and erasure coded to n chunks.
- A KZG commitment Cᵢ is computed over the same polynomial used in the RS code for each row.
- A same-point cross-commitment batch proof πₗ is computed for each column.
- Each column dₗ is sent to a different replica along with its batch proof πₗ, all the row commitments (C₀, …, Cₖ) and a “root” commitment c = Hash(C₀ || … || Cₖ).
- The root commitment is a tag to show that replicas belong to the same original data. Each replica still needs to store a copy of all row commitments.
Since commitment and encoding are performed over the same row polynomials pᵢ(x), the encoded data chunks are guaranteed to lie on the same polynomial as source data chunks. Thus, the whole data is guaranteed to be recoverable if enough column replicas are collected. Subspace adopts this idea into our Plotting algorithm, which will be described in the next episode of this series.
A distinctive feature of this scheme is the distribution of chunk witnesses to the replicas. A naive approach would be to send a separate witness for each chunk in the column, making it O(k) communication overhead per replica for the dealer. Instead, ECP achieves a constant-sized proof πₗ for the whole column l via cross-commitment batching, as follows:
Let pᵢ(x) be the polynomial corresponding to row i. Let
be the challenge polynomial derived from the row-wise polynomials, using constant multipliers aᵢ known to all parties. Specifically:
- The verifier specifies a random challenge value c
- Each aᵢ equals the Lagrange basis polynomial Lᵢ(x) evaluated at the challenge value. Since all the parties know the Lagrange basis polynomials Lᵢ(x) they can locally compute the multipliers aᵢ = Lᵢ(c), for i in [0,…, k — 1]
- This scheme would work equally well if the multipliers were chosen in some other way as long as all the parties agree on them(e.g., wᵢ = cⁱ). The challenge c itself can be computed with the Fiat-Shamir transformation.
The dealer sends to each replica j an evaluation and a witness:
Each replica computes the commitment to the challenge polynomial. The replica cannot directly commit to it by computing just
because it does not know all the row polynomials that constitute the challenge polynomial (here, s is the secret value used in KZG trusted setup and g is the group generator point).
However, the replica can locally compute the commitment to the challenge polynomial as a linear combination of row commitments, which is possible because of the KZG homomorphism property:
Now it can verify the consistency of the proof triplet (the latter two received from the dealer in the previous step):
Semi-Avid-PR
The paper Information Dispersal with Provable Retrievability for Rollups presents a storage- and communication-efficient asynchronous verifiable information dispersal (AVID) protocol motivated by rollup application requirements. The system is built using linear erasure-correcting codes and homomorphic vector commitments.
An example with a non-systematic (k,n)-RS code and KZG commitment goes as follows:
- A single client arranges the source data into L rows k columns matrix U.
- The client computes source column commitments h₁, …, hₖ. The source data chunks in this step are treated as evaluations of column polynomials used for KZG commitment.
- The client computes an L×n erasure-coded matrix C row-wise. The data chunks in each of the L rows are taken as coefficients of a polynomial degree (k — 1) and evaluated at n points. The coded data is evaluations of those polynomials (coefficient view RS)
- The client sends to each storage node i the i-th column cᵢ of the coded matrix C and all source matrix column commitments h₁, …, hₖ.
- The storage node verifies the authenticity of the received column with source commitments, computes source root commitment C as an arithmetic hash of all source column commitments, and stores the commitments and their assigned column.
As in ECP, one must collect k columns from storage nodes to decode the source data.
Also, as in ECP, the client must send all the source data commitments to every storage node to store them alongside their respective columns. However, notice that in both schemes, the erasure coding happens row-wise. Still, in Semi-AVID-PR, the commitment is not performed over row polynomials as in ECP, but column polynomials. This orthogonality in the “direction” of the two primitives is why we consider it as introducing another “dimension” to the scheme. In this case, the erasure-coded data does not lie on the committed polynomial but has a potent property: erasure coding of source commitments gives commitments to erasure-coded data columns. More specifically:
Let VC(u) denote the vector commitment operation on vector u.
Let Encode(u) denote the erasure coding of vector u.
Let pᵢ(x) be the polynomial corresponding to the column uᵢ, for i in [0,…, k — 1]. It is a linear combination of the Lagrange basis polynomials Lᵢ(x) and the column source data fragments Uᵢⱼ:
The corresponding column commitment:
The rows in the source matrix U are erasure-coded using the non-systematic Reed Solomon code (the coefficient view) to get the extended rows of the coded matrix C. If aᵢˣ are the column entries in the RS code generating matrix G, each extended matrix entry is computed as:
Then, on the left-hand side, the i-th erasure coded commitment is computed from the source column commitments as:
This can be regrouped around the Lagrange polynomials:
On the right-hand side, the polynomial corresponding to column cᵢ in the extended matrix C (similar to the column-wise polynomial in U) is expressed as:
And thus, the commitment to this encoded column polynomial is:
As can be seen,
This property that erasure-coded commitments equal commitments to erasure-coded data follows from additive homomorphism of KZG commitments and is another way to prove the correctness of erasure coding, different from what is used in ECP. Subspace adopts this powerful idea from Semi-AVID-PR for our new Archiving protocol.
What’s also notable is that this construction does not use proofs, only the commitments and homomorphic properties of the commitment scheme.
It’s also worth noting that the commitments to the source and the coded columns at the same index differ. Source data is not stored explicitly alongside the extended data as in Polygon Avail or ETH Danksharding schemes. However, the scheme can be generalized to a systematic RS code so that the source data and the commitments are replicated explicitly.
We want to express special thanks to Joachim Neu for his help in understanding this protocol and publishing the example implementation on GitHub. The contribution was very helpful to the Subspace Research team in narrowing down a suitable algorithm for experimentation.
Polygon Avail
The Avail scheme is the solution behind Polygon’s data availability layer (open-source on GitHub): this component would receive transactional data and order it without any execution. It would then store the data and ensure complete data availability in a decentralized manner, as follows:
- The block producer breaks the block data into 31-byte chunks and arranges the chunks into n rows and m columns to form an n×m matrix D.
- For each row i in [1,…, n], the block producer constructs a polynomial pᵢ(x) in the evaluation form.
- The block producer commits to each polynomial to get source commitments (C₁, …, Cₙ), respectively.
- For redundancy, it encodes (C₁, …, Cₙ) to (C₁, …, C₂ₙ), which are added to the block header and broadcasted.
The scheme is described for three types of clients: light clients, column-full nodes, and full nodes, by how much information they have access to in a blockchain, in ascending order:
- The light clients only have access to block headers. When querying a data block, the light clients will sample some chunk D[𝑖][𝑗] and the corresponding proof π[𝑖][𝑗].
- “Column-full” nodes download a single column of n source elements D[1][𝑗], …, D[𝑛][𝑗] and their corresponding proofs π[1][𝑗], …, π[𝑛][𝑗]. The node can extend the column and the proofs to 2n points and verify whether the extended evaluations are valid (using the same homomorphism property of KZG as in Semi-AVID-PR above).
- Full nodes download the whole block, redo the erasure coding and recommit to see if all 2n commitments match.
Overall, the scheme is similar to Semi-AVID-PR in using erasure coding and commitment in orthogonal directions.
The main difference is the usage of systematic RS code optimized for finite fields and 1/2 rate and explicit usage of witnesses. The basic idea is that the column data, respective witnesses, and row commitments are all scaled by the same multipliers for the erasure code. The resulting data, witnesses, and commitments also form a consistent proof system.
Let
where aˣ are the column entries in the RS encoder generator matrix.
Let r (n ≤ r < 2n -1) be a row in the coded (extended) part of the block.
An entry in column c can be computed as:
The witness for this extended entry is then computed as:
The numerators can be rearranged as follows:
Hence
The commitment for the extended row r is computed from source row commitments as:
As can be seen, the extended evaluation, a witness and a corresponding commitment Cᵣ form a valid proof for the polynomial P(x).
An interesting thing to note about Avail is that the design conceptually matches a “2D scheme” initially suggested by Vitalik for Ethereum upgrades. However, Ethereum later adopted a different (and more complex) 2D scheme, which is not the same “2D” in the dimensionality sense. The updated ETH scheme is described below.
Ethereum Danksharding
This KZG-RS scheme will soon be integrated into Ethereum’s protocol with Proto-Danksharding (EIP-4844). It is conceptually the most complex one out of all described in this article due to erasure coding being performed both row-wise and column-wise.
The example follows the introduction video and an explanatory post by Dankrad Feist:
- The payload data is arranged into an n×k matrix.
- The data is interpreted as evaluations of a 2-dimensional polynomial p(x,y) with maximum degree n — 1 in x and k — 1 in y.
- This bivariate polynomial is then extended in both variables to double the domain in each variable, effectively quadrupling the size of the data into a 2n×2k matrix.
- It then commits to each row polynomial (by fixing a value of the other variable) to get (C₁, …, Cₙ), respectively.
- For redundancy, the commitments (C₁, …, Cₙ) are also extended to (C₁, …, C₂ₙ).
This scheme generalizes erasure coding by extending a polynomial evaluation domain to the multivariate case. It also uses the KZG polynomial commitment scheme generalized to multivariate polynomials.
Let p(x,y) be the 2-variate polynomial corresponding to the 2D array of source data. It can be expressed in terms of 2D Lagrange interpolation as:
where LR(x) and LC(y) are Lagrange basis polynomials, and the basis polynomial variable values are the row and column indices: x in [0,…, n — 1] and y in [0,…, k — 1] respectively. This polynomial can be evaluated to get the extension data points corresponding to the values of x in [n,…, 2n — 1] and y in [k, …, 2k — 1].
The 1-dimensional polynomial for row = i is then expressed as:
The corresponding row commitment is
By erasure coding in both rows and columns, this scheme is more flexible because it is possible to reconstruct any row or column individually with enough chunks. Furthermore, there is no need for a full node to encode and commit to the whole block. Lifting a requirement for full nodes is a great step towards scalability, as distributed reconstruction should work with nodes that only process rows or columns. Distributed block production should also work (in theory, but they’re not implementing it).
Data Availability Sampling in this scheme is performed over batches of 16 consecutive chunks in the same row with a precomputed subvector-commitment batch proof to a row commitment.
Conclusion
This article describes several schemes for combining KZG and erasure coding to achieve data availability and redundancy. The existing schemes include ECP, Semi-Avid-PR, Polygon Avail, and Ethereum Danksharding. While each scheme has its unique approach to achieving data redundancy and availability, they all rely on the homomorphic properties of KZG commitments. For example, the correct use of erasure coding guarantees the retrievability of the data, while KZG commitments provide a mechanism for verifying its authenticity.
Subspace dilithium consensus ultimately adopts a stacked scheme with a crosswise encode-commit for Archiving and the ECP system for Plotting. These two mechanisms work together to ensure the network remains secure and resilient against attacks and the archived data remains recoverable. In the next episode of our Hitchhiker’s Guide to Subspace, we will delve deeper into the intricacies of these mechanisms and how they interact to maintain the blockchain’s security.