Henosis: A framework for Avail to aggregate heterogenous ZK proofs

18 min readMay 7, 2024

Introduction

With the current focus on rollups in Ethereum, we will see thousands of them in the future, including both general-purpose and app-specific rollups. These rollups rely on Ethereum for security by settling their blocks on it and in future these rollups would start using Data Availability solutions like Avail DA or a Data Availability Committee on Ethereum.

In future, as the modular architecture becomes more mature, many of these rollups will shift towards a modular solutions for rollups, which offer benefits such as better finality, composability, and lower costs over time. However, the end-user experience of interacting with these rollups will fragment despite these advantages. Blockchain users today need interop solution in a world with just a few chains; increasing the number without fundamentally improving how these systems work together will complicate things further. Solutions like Avail Nexus try to solve this issue, where proof aggregation plays a significant role.

In this blog, we will first discuss proof aggregation and why we need proof aggregation. We will argue about the problem with current multi-chain communication (especially rollups) and how even having trust-less bridging will not help solve this problem in the long run. Especially if we need web2-like async communication between multiple rollups the way we have async communication between micro-services, the current architecture will not scale. We will also mention the bare minimum trust assumptions required for a rollup to interact with any other rollup regarding consensus, execution ordering, the validity of states, and the role ZK proofs play in this scenario.

In the next section, we will present a short overview of ZK recursion, explaining the difference between aggregation of proofs that can occur without recursion for homogenous proof composition and how recursion is required in aggregation for heterogeneous proof composition. Some pointers regarding various Snark and Stark recursion techniques are prevalent in the academic industry and might help the reader understand our approach better.

Finally, we will share some approaches we experimented with and some benchmarks. The main idea is to find an efficient and secure approach for heterogeneous proof aggregation. We are still in the experimentation phase, and this article is just an effort to mention some approaches we have taken and the problems we feel they have. There are still many proof aggregation systems(ex SuperNova) and protocols(ex Nebra) out there that might be better, and we need to experiment with them to find the optimal architecture in terms of speed, efficiency and overall security.

Prerequisite: Readers are preferred to have a basic understanding of zk-snarks, zk-starks, zkVMs, zk-rollups, validity proofs and some Interactive Oracle Proof (IOP) based commitment schemes like FRI and KZG.

WTF is proof aggregation?

In simple terms

Proof aggregation is a technique in which multiple zero-knowledge proofs, 𝜋1,𝜋2,…,𝜋𝑘 are combined into a single aggregated proof, 𝜋_agg. This aggregated proof verifies the validity of all statements demonstrated by the individual proofs, significantly reducing the computational and storage requirements for verification processes.

This method mainly benefits blockchain, in enhancing transaction throughput and scaling decentralized applications. However, generating an aggregated proof involves additional computational effort, which can introduce delays in achieving finality, especially in blockchain rollups.

Why do we need proof aggregation?

Current bridging between rollups

All those validity proofs that get posted on Ethereum serve as a way to confirm transactions for these rollups, ensuring that a new rollup block is created upon fair execution of transaction. However, when one rollup needs to interact with another, it has to do so synchronously, leading to a poor user experience. Scaling blockchains with synchronous communication between rollups is hard to imagine. By synchronous communication, I mean that one rollup has to wait for confirmation on the Ethereum (L1) before any action can be taken in another rollup. In contrast, web2 architectures use asynchronous micro-services. For example, Amazon uses many backend micro-services, such as Visa processing and inventory management, that handle their tasks independently and interact with each other asynchronously.

The situation with the current rollup infrastructure is similar; seamless communication between them is crucial. The effectiveness of a modular world depends heavily on the messaging protocols that manage communication across different rollups. These cross-rollup communications often use bridges, and these bridges require minimal trust to match the security level of a traditional, unified blockchain. When bridges operate between rollups on the same Data Availability (DA) Layer, they don’t cross trust and security zones because they all rely on the same consensus and security rules that determine the order of transactions. However, for these bridges to be genuinely trustless, a roll-up must verify whether the actions were carried out correctly so it doesn’t have to depend on external assurances for these guarantees. Before this communication takes place, we need to consider many things regarding the trust assumptions of rollups using different DA layers, whether there can be async messaging between rollups, and how the state will be verified between various types of rollups. One thing you can remember is that whenever an L2 blockchain communicates with another L2, it must verify two things: first, establish transaction order, which can be ensured by DAS(Data Availability Sampling), and second, it should validate the processes to ensure its safety using state validity proof.

Trust guarantees are acquired through recursion.

Consensus sets a chain’s established order of transactions. DA Layer gives rollups a consensus on the order of their transactions. This is a significant advantage for rollups sharing the same DA Layer. The same consensus rules determine the order of transactions for all rollups, including their own. This means that even if there is a reorganization (re-org) of the blockchain, the order of transactions for all involved rollups will follow this re-org.

To ensure these types of guarantees, we need inclusion proofs, completeness proofs, and validity proofs. An inclusion proof confirms that data is included in a DA block. A completeness proof ensures that the state transition has utilized all the transaction data it needs*, and a state validity proof verifies the correctness of the state transition.

We can check the correctness of these rollup blocks by verifying state validity proofs.

The challenge comes when we recognize that each rollup may have unique state transition functions, with design choices tailored to its specific domain. Validation of executions might depend on game theory, incorporating fraud proofs, or validity proof systems relying on zero-knowledge proofs (ZK proofs). Even within the ZK rollups, the proof systems may exhibit variation, ranging from Groth16 to PLONK. Rollups don’t need to know the specific details or functions of other rollups; they need to confirm that these functions were carried out correctly and understand the results that matter to them. Verifying all necessary transactions with just one proof would be a significant breakthrough. A verification hub that facilitates cross-chain communication while managing the technical specifics of different rollups would be ideal. Avail Nexus, which is a custom ZK coordination rollup on top of Avail that consists of:

Proof aggregation and verification layer
Sequencer selection/slot auction mechanism system.

Nexus also submits the aggregated proof periodically to Ethereum and the Avail DA layer for verification. A custom module within Avail DA verifies the aggregate proof.

A brief overview of ZK Recursion

*How to think about self-recursive SNARKs (source*)

Recursive Zk proving with S{n,t}arks

ZK proofs let us confirm that a specific computation was carried out correctly. The verification of a ZK proof is itself a computation and can be performed within another ZK circuit, a process known as recursion, which allows for more complex systems that can verify multiple proofs together. Recursion means verifying one zkSNARK proof within another. This enables provers to include more information in their proofs while keeping them succinct. The verification time of recursive zkSNARKs is comparable to that of non-recursive ones. Recursive zkSNARKs primarily enhance compression and composability.

Recursion allows information compression into proofs that remain verifiable in constant or polylogarithmic time. This makes them ideal for rollups, where they can efficiently compress a sequence of proofs. Traditionally, proving costs scale linearly with the number of constraints. However, with recursion, you can extend the proof to include more constraints without re-verifying everything from scratch, notably saving computational resources. This approach also helps manage practical limitations like circuit size and memory requirements, as smaller circuits can perform more extensive computations through recursive proofs.

Aggregation of proofs

The recursion of these ZK proof systems also enables “Aggregation” of proofs, which means replacing multiple ZK proofs π1, π2, …, πk with a single aggregated proof π_agg, which implies the validity of all statements otherwise demonstrated by the individual proofs πi.

I love the simple explanation from Tomas Krnak in his blog:

Let’s say that we have two transactions, t1 and t2, attested by two proofs, p1 and p2. A recursive SNARK allows us to generate a single proof for the claim, “There exist valid proofs that prove the validity of t1 and t2.” We have just compressed proofs p1 and p2 into a single proof! Furthermore, this procedure can be repeated, and thus, an unlimited number of proofs can be aggregated into one.

We’ve been focusing on verifying popular heterogeneous proof systems used widely in the industry. The problem with most heterogeneous proof systems is that they don’t share the same verification method. Two PlonK circuits can share the same verification key if their public input lengths are equal and verifiers perform nearly identical work(even though their sizes are unequal). Some works, like SuperNova, build efficient non-uniform proof aggregation schemes capable of aggregating proofs from different circuits into a single proof.

Recursion in different types of proof systems

Pairing based SNARK recursion

A significant technical advantage of these recursive setups is their flexibility to combine different proof systems and arithmetizations, harnessing the best features of each. SNARKs, which use pairing or KZG commitment-based proofs, are fast to verify and have small proof sizes, making them ideal for verification on Ethereum’s EVM. A SNARK is considered recursive if it can efficiently prove the validity of its proofs.

**A simple KZG proof verification equation (source**)

In the above KZG equation, p(x) is a polynomial that the prover wants to prove to the verifier, ‘z’ is a random point of opening given by the verifier to the prover, 𝑝(𝑧)=𝑦 and 𝜋=[𝑞(𝑠)]. Remember, the commitment to the polynomial 𝑝(𝑋) is defined as 𝐶=[𝑝(𝑠)]. An important point is the group element in which these terms are used. 𝜋 & C-[y] is a G1 group element and [s-z] & H is G2 group element. Read “Preparing for Pairings” in this excellent blog to know more about them. A similar verification step takes place in groth16.

While SNARK verification costs are often considered constant, the reality is that for groth16, the cost scales with the number of public inputs. This becomes more evident when verifying a proof within another SNARK, as each additional public input increases the proving cost. To address this issue, a simple technique can be used, as mentioned by Yi Sun, where all intended public inputs are instead treated as private within the circuit, and the only public input is a hash of these private inputs. The circuit verifies that this hash matches the one provided as the public input.

We have done much work on SNARK aggregation techniques, which don’t use recursive pairing verification. Some well-known homogenous SNARK aggregation techniques are:

SnarkPack is a sophisticated aggregation scheme for Groth16 proofs, characterized by their compactness and consisting of just three elliptic curve points (A, B, C). Verifying these proofs involves a pairing operation, checking the equation e(A, B)=Y⋅e(C, D) where Y encodes public inputs and D is a fixed elliptic curve point in the G2 curve.

Aggregating k Groth16 proofs, SnarkPack introduces efficiency by combining individual proof checks into one. This is achieved by forming a random linear combination of the proofs using a scalar r, simplifying the verification to a single equation:

This method leverages Inner Product Arguments (IPA) to efficiently manage computational load, resulting in logarithmically scaled proofs and verification times.

aPlonK is an advancement within the PlonK protocol family, specifically designed to enhance efficiency by reducing proof size and verification time when handling multiple statements in a batch. This improvement is achieved by utilizing a multi-polynomial commitment scheme (MPCS), a useful primitive that extends traditional polynomial commitment schemes to handle multiple polynomials simultaneously. The proof size and verification complexity scale logarithmically with the number of aggregated statements, which significantly optimizes performance for applications like blockchain rollups where verification speed is critical.

Note: The recursive proof is not verified in the above two techniques, and aggregation only occurs.

Limitations of SNARK aggregation through recursion include the implementation of bilinear pairings check, which is done in groth16(3 pairings and ‘l’ multi-scalar multiplications where ‘l’ is the number of public inputs in the circuit) and Plonk(2 pairings, and 18 G1 group operations)

is extremely constraint-heavy inside a circuit. zk-pairing mentions that even after all the optimizations, it’s close to ~20 million constraints per pairing check, which results in 250 seconds proving time on a beefy server (32-core 3.1GHz, 256G RAM). Which is quite a lot and far from the reach of client-side proof on laptops/desktops!

Stark Recursion

In contrast, STARK proofs are more extensive and involve verifying Merkle Inclusion paths. Mostly, STARKs use AIR arithmetization, which is well-suited for proving state machine executions but tends to be costly to verify. By using recursion, one can generate a STARK proof for a state machine’s execution and then verify this STARK proof within a pairing/KZG Snark. This results in a SNARK proof that is much cheaper to verify. Most ZK Rollups in production use ZK Starks to generate proofs as they offer better concrete performance and efficient recursion.

In Stark recursion, multiple FRI proofs are usually batch-combined and verified all at once.

where di is the DEEP polynomials, w_RAP is the total trace polynomials, and the prover uses the alpha FRI parameter.

FRI Batching is a technique where you combine multiple trace polynomials into one single polynomial. This reduces the amount of computation and communication needed for each commitment. By grouping polynomials, the verifier can check the commitments more efficiently while keeping the protocol secure and intact. FRI Batching uses methods like mixing and splitting to merge several polynomials into one commitment, making processing and validation smoother. This method is beneficial when you need to handle many commitments at once because it uses resources better and enhances the overall performance of the protocol. Another structure used in this context is the ZK tree, which uses a binary tree format to verify proofs.

***Proof Recursion used by many ZK systems (**Source***)

At its core, zkTree uses recursion to combine many user proofs into one compact proof. This single proof can be checked quickly and costs much less than traditional methods. Its hierarchical design significantly cuts down on the computing and storage demands on blockchain networks like Ethereum’s EVM by making proofs smaller and allowing batches of transactions or data validations to be processed together.

STARKs are faster for recursion as there are no pairing checks, and if we choose a proper hashing function, we can reduce the number of constraints in the verification.

Folding schemes

Nova framework advances recursion in cryptographic proofs by introducing “folding.” Unlike the Halo framework, which delays verification, Nova delays the computation of (optional) succinct proofs. Folding allows a prover to amalgamate two witness values, w1 and w2, into a new witness w3 of equivalent size, ensuring that validating w3 simultaneously confirms w1 and w2.

Nova predominantly facilitates Incremental Verifiable Computation (IVC), wherein each computation step involves the IVC prover integrating a previous witness with a new one and validating the merge. This guarantees that at any stage, verification of the most recent IVC proof confirms the accuracy of all preceding computational steps.

Folding based Non interactive proof aggregation

In proof aggregation within IVC, each computational iteration validates a segment of a batch proof, ensuring the proof size remains constant regardless of the number of incremental steps. Nevertheless, IVC proofs can still be substantial, with their size directly proportional to the circuit size of the IVC step function. In practical scenarios, the IVC process can halt at a certain point, after which a SNARK proof is generated from the final IVC output. This SNARK proof is compact and easily verifiable, allowing the IVC system to generate proofs without depending on the SNARK process.

Current approach

To reiterate the problem we are trying to solve:

We want to aggregate these heterogeneous proofs from rollups(usually KZG pairing-based proofs)into a single ZK proof and then post that to the Nexus in Avails architecture.

At a high level, the current solution mainly relies on zkVMs for verification and consists of recursively verifying all the proofs inside the zkVM. The proofs are now aggregated together inside the zkVM environment into a single proof. Our main goal is to standardize these proofs into a common system through recursive verification to simplify their aggregation.

The SNARKs-proof land has various types, depending on the platform. As mentioned above, Polygon ZK-EVM uses a system called Fflonk, and zk-sync uses a modified version of Plonk called Boojum for their operations. The on-chain contract, which is the settlement contract of the rollup on Ethereum L1(also called the bridge contract), verifies these proofs and updates the state of the rollup. Ethereum L1 is often called the “settlement layer” for rollups because the L1 smart contract supposedly decides the canonical rollup chain. It is from these contracts that we fetch these proofs through a relayer.

As mentioned above, in these KZG-based systems, there is the need for final pairing checks to ensure the proof’s validity, which is performed by the platform’s on-chain verifier smart contract. We plan to convert all these fetched proofs, KZG-based(and STARK-based from StarkNet in future), into a standard format. Currently, this ordinary format is generated using the zkVM, which internally uses the standard AIR-FRI protocol, DEEP-ALI, and the batched FRI protocol. The zkVM Prover uses the constraints and the execution trace to generate the validity polynomial ‘f’ validity, then uses the validity polynomial to construct the validity witness w and then sends the commitment to w to the Verifier.

These fetched proofs from rollup L2 contracts on L1 are then verified within the zkVM because the verification, as mentioned above, is a process and a computation that can be modelled as an execution trace in a VM. The prover submits proof, along with the appropriate public inputs, and if verified correctly, it confirms the entire computation process. As mentioned above, a significant challenge in verifying KZG-based SNARKs is that pairings require many complex operations related to elliptic curves. For elliptic curve arithmetic on a non-native field, optimizations used in big integer arithmetic need to be applied and adjusted for some specific curve.

Current design and components in a binary proof aggregation

Our architecture prioritizes provable computation capabilities rather than the zero-knowledge aspect (currently, NO zk aspect). We have two modules:

Common proof generation(Converter): It converts all the input proofs to the proofs generated by zkVM.
Aggregation of STARK(Aggregator): Most of the zkVMs (SP1, Jolt and Risc Zero) are currently producing STARK proofs, and this component consists of composing them recursively.

The zkVM proving component currently utilizes Risc Zero’s Bonsai to recursively verify the underlying rollup proof from L1, which is fetched from the relayer. A converter/adapter component will receive the proofs from the rollups and then generate the STARK proofs. This architecture addresses the challenge of aggregating heterogeneous proof systems from different rollups on Layer 1, which often employ varying versions of pairing/KZG-based proofs. One key thing to note is that the public inputs of different proofs need to be appropriately handled and passed down to the architecture used in the Avail Nexus.

We can implement FRI batch-proof aggregation. A notable limitation of the FRI polynomial commitment scheme is its lack of homomorphic additivity, unlike KZG-based schemes, which complicate the direct aggregation of FRI proofs. To address this, one solution is to leverage Plonky2’s fast recursion capability for FRI proofs or to facilitate the verification of the STARK proof directly within the zkVM. Risc Zero has provided a STARK verifier via composition; it allows for the efficient recursive verification of STARK proofs in an AIR builder. This internal verification streamlines the process, enabling a more integrated and seamless approach to handling stark proofs within our architecture.

Note: It’s good to mention that we experimented with one more approach earlier that we didn’t end up using. The idea was that after we get the standard zkVM proofs, we recursively convert them to a pairing-based proof by verifying the zkVM proof inside the circuit. Eventually, those pairing-based proofs can be aggregated together. This is similar to what we explained above in STARK aggregation. Earlier, we used the Risc Zero Stark to Snark converter for groth16 proof generation and the Snark-verifiers library by PSE( and modified by Axiom) for aggregating the generated groth16 proofs. We wrote a groth16 proof verifier in Halo2-lib, which gives us the Halo2 Plonk proof, which the Aggregation chip** requires. The main problem with this approach was the time lag and computation overhead in STARK to SNARK conversion, as STARK proofs are large and expensive to verify.

Polygon Zk-EVM and Zk-Sync Benchmarks

We’ve started with Polygon’s zk-EVM and zkSync for our project. Polygon zk-EVM uses a proving system called FFLONK to convert STARK proofs to SNARK proofs, while zkSync has its own modified version of PLONK for the same purpose. The inner dynamics of these proof systems is a topic itself for another blog(i have written about fflonk; will cover mPlonk later)

One interesting thing to note is, that both proving schemes use the Keccak hash function for Fiat Shamir heuristic, which makes verifying proofs within the circuit expensive due to the high number of required computations. Specifically, FFLONK must perform 5 hash operations, whereas zkSync’s modified PLONK requires 99 hash operations. Because of this FFLONK verification has far fewer constraints and less overhead compared to zkSync’s modified PLONK, making it a more optimized solution for verification. We ran the RISC Zero benchmarks using BonsaiBonsai and the SP1 benchmarks using c6a.32xlarge 128 vCPU with 256 GB RAM

Important: These numbers are for a very crude implementation of fflonk and modified plonk verification, which could be greatly improved with precompiles, acceleration, and reducing copy constraints.

We also tried running benchmarks on SP1(zkVM by Succinct Labs) on a c6a.32xlarge machine but as Bonsai is currently using GPU acceleration and proof composition the results cannot be compared for benchmarks. However, we will add the updated results as soon as the SP1 team adds the support.

Github code link: https://github.com/availproject/Henosis

Conclusion

Finally, the following work is experimental, and we are still improving. The idea for homogenous proof aggregation differs from something used across the industry. It’s the heterogeneous proof aggregation that opens up a new domain of applications. The goal for atomic composability is the north star, which Avail Nexus aims for, and validity-proof aggregation is one part of the whole architecture.

If you find any mistakes or find any unexpected results, feel free to reach out to me on my Telegram account @rishotics. I’m keen to fix any inaccuracies to make sure the approach is reliable and thorough.

Acknowledgments

This work would not have been possible without feedback and suggestions from Anurag Arjun, Prabal Banerjee, Vibhu Rajeev, and many from Avail’s team. Special thanks to Paul Gafni, John Guibas, Nicolas Ramsrud, Aayush , Ayush, Rahul and others for reviewing and giving valuable feedback.

References

[1]: fflonK: a Fast-Fourier inspired verifier efficient version of PlonK

[2]: PlonK: Permutations over Lagrange-bases for Oecumenical Noninteractive arguments of Knowledge

[3]: Efficient polynomial commitment schemes for multiple points and polynomials

[4]: Aggregatable Subvector Commitments for Stateless Cryptocurrencies from Lagrange Polynomials

[5]: Avail’s Vision: The unification layer for web3

[6]: Axiom: Snark Verifier

[7]: UniPlonk: Plonk with Universal Verifier

[8]: Proof aggregation schemes: SnarkPack and aPlonk

[9]: ZK10: FRI-based recursion with a Groth16 Wrapper — Paul Gafni

[10]: KZG polynomial commitments by Dankard

[11]: STARK by Hand

[12]: RISC Zero zkVM: Scalable, Transparent Arguments of RISC-V Integrity

[13]: How to code FRI from scratch?

[14]: Boojum, the scariest SNARK implementation

[15]: On the Size of Pairing-based Non-interactive Arguments?

[16]: zkProofs & Recursive SNARKs: Arhat, L2IV

[17]: Constant-Size Commitments to Polynomials and Their Applications

[18]: Aligned Layer: universal verification layer

[19]: PSE: Snark Verifier

[20]: Notes on efficient polynomial commitment schemes and fflonk

[21]: Introducing zkTree: a zk recursion tree with ZKP membership proofs

Footnotes

*completness proof means that we are making sure that all the data that is posted on the Avail DA is used for the state transition by the rollup to generate the validity proof.

**Inside the aggregation chip the costly pairing check is deferred till the last and we aggregate the pairings using a Random Linear Combination (RLC) of the pairing values of all the SNARKS available with us.