Diving into the zk-SNARKs Setup Phase

To discuss this post further, see the ZKProof community forum


Zero-knowledge proofs are very sophisticated cryptographic protocols with many applications, of which blockchain privacy and scalability are getting a lot of traction. They are getting wide attention from practitioners and business people, which is generating some buzz. This hype, like any other, can bring with it misunderstanding or confusion about some of the details of this technology.

This post will discuss why some zero-knowledge proofs rely on pre-generated keys (produced by a trusted setup or a multiparty ceremony), and why this can be a good trade-off for many applications and in a sense, is even logically necessary. But first, some background.

Zero-knowledge proofs allow one party to prove to a verifier that a statement is true, without revealing any information except the validity of the statement itself. In the literature, we also say that some data satisfies a relation. They were first invented in the 80’s [GRM89], which set the path to understand what knowledge really is and how we can convincingly prove some statement.

I recommend watching this recent talk by Silvio Micali at Berkeley.

If you simply want to understand better what NIZKs or SNARKs are and how we can obtain general-purpose constructions, see the Zcash explainer or this QED-it presentation. If you want to dive into the details, I recommend Vitalik’s posts and the ZKProof Standardization proceedings.

Many applications require proofs that are non-interactive, for which there are several constructions, called Non-Interactive Zero-Knowledge proofs (NIZKs), many of which have become practical in the past decade. Most of you may have heard of SNARKs such as [PGHR, BCTV14, Groth16], STARKs such as [BBHR18], and Bulletproofs [BBBPWM17], yet there are plenty of other constructions (Ligero, Hyrax, Aurora, etc.). I will not go over them, nor what their differences are, that is for another post. You can check out zkp.science, which contains a comprehensive list of references with much of the work done in zero-knowledge.

Many applications require proofs that are non-interactive, i.e., can be just written down (e.g., on a blockchain) and read later by a verifier. This requires that both the prover and verifier have access to some common knowledge, usually encoded in a Common Reference String (CRS), a way of saying that the proving and verification keys are generated by the same Setup algorithm, and are mostly the same. We will overview different types of CRS, while keeping in mind that the field is still in development and new terms and concepts can come up often.

Why Trusted Setup?

Inherent to every zero-knowledge proof system, in order to prevent the prover from cheating and generating fake proofs, the verifier must know exactly what is being proven. It is directly related to the soundness of the scheme, and in some instances can mean for the verifier to hold the full representation of the statement. This is inefficient where the statement being proven is large, since it results in slow verification, creating a bottleneck in applications that require low latency (e.g., transaction propagation) or high bandwidth.

This is addressed by pre-processing SNARKs (Succinct Non-interactive ARguments of Knowledge), which are succinct NIZKs (short proof and fast verifier). One major way that the verifier of generic NIZK schemes have become practical is by creating a pre-processing phase that encodes the relation into the CRS, called a Structured Reference String (SRS), which is available to both the prover and the verifier. This is the kind of pre-processing that requires setup ceremony.

Let’s overview the three different ways that exist today to ensure the verifier knows what is being proven:

Verifier gets the full statement

The verifier uses an explicit representation of the relation, which means that the verifier running time is at least linear in the relation size. This is, for example, the case with Bulletproofs, which is why they are best for small statements, such as range proofs. In some instances, they produce a Uniform Reference String (URS), which is another kind of CRS that does not encode the relation and does not necessarily give succinctness of the scheme. Importantly, it gets the prover and verifier to agree on some randomness and public parameters.

The common part between the prover’s input (P) and the verifier’s input (V) is the CRS for the Bulletproofs construction. The W terms are linear in the size of the relation, but do not encode it, hence a URS.

Verifier gets a succinct representation of the statement

The verifier uses a succinct representation of a uniform relation, meaning that there is a basic representation of a computation that is repeated many times. For example, proving the computation of “1000 Pedersen Commitments” would require only one Pedersen Commitment as the relation, which is essentially expressed succinctly by describing the transition function at all steps. In this case, there is a URS generated from a pre-processing that is sub-linear on the size of the relation.

This is the case, for example, with proof systems based on algebraic PCPs, for which statements can be represented as succinct algebraic constraint satisfaction problems [BCGT12]; this includes the Kilian/Micali construction as well as the FRI-based STARK construction. The transition function is encoded in the algebraic constraints, which compile into a URS.

Known constructions of these form rely on a random oracle source of randomness and strong assumptions about it, but they can be heuristically instantiated by replacing it with a hash function such as SHA-3 (this is called the Fiat-Shamir heuristic). After this instantiation, there is no explicit CRS or URS.

Pre-processing

SNARKs achieve succinctness for any relation, even if it does not have a repetitive structure and is not a short statement. To accomplish this, the verifier relies on a “summary” of the relation, meaning that a pre-processing stage digests all the relation into a short string of bits. This is, for example, the case with QAP-based pre-processing SNARKs, such as PGHR, BTCV14, Groth16, etc. In this case, there is an SRS generated from a pre-processing that is both linear in the relation and depends explicitly on that relation. Specifically, the prover’s part of the SRS is linear in the relation size, while the verifier’s part is short.

Here is an example of an SRS as in [BTCV14]; notice how the proving and verification keys depend explicitly on the relation, in this case represented by the Quadratic Arithmetic Program (QAP) polynomials A, B and C.

One clear conclusion that can be derived is that if one wants succinctness at the verifier for a relation that is not uniform, there is no way of getting rid of the SRS pre-processing phase. It is important, though, that such pre-processing be verifiable to all parties relying on the proving system.

So what is the problem?

The balance between privacy for the prover and assurance of not cheating for the verifier is delicate. In general, zero-knowledge systems require the use of some randomness to represent the challenge that the verifier sends to the prover (another consequence of the need for soundness, to prevent the prover from cheating). In the context of non-interactive systems, the randomness must be established in advance and its generation be publicly verifiable. In the case of the URS, this is a given.

The setup for an SRS, however, relies on non-public randomness, which has come to be known as toxic waste since it is essentially a backdoor to generating fake proofs (check out Kobi Gurkan’s post, which even has some code!). And hence the name, trusted setup, simply implies that an entity that is trusted should generate the keys.

Of course, having one single entity generate the keys sort of defeats the purpose of using zero-knowledge proofs in the first place, at least in public systems where everyone is a prover and a verifier. This is exactly why SNARKs that have a structured setup phase are best coupled with a multiparty computation (MPC) for the setup, to reduce the trust assumptions. Ideally, such an MPC involves a large number of parties, and ensures that even if merely any one of the parties behaves honestly and erases their own local secrets, then the result is secure. Put otherwise, security holds unless all parties are malicious or compromised. The Implementation Track proceeding (page 15) has a more formal and detailed discussion about the requirements to implement the trusted setup correctly. And here is a great explainer by Aviv Zohar of the properties of the trusted setup, using a Sudoku analogy.

The Zcash MPC Ceremonies

As of today, there are few production systems using zero-knowledge proofs. Zcash is one of them, and both the Sprout and Sapling versions use SNARKs, a NIZK with structured pre-processing. There have been indeed two different trusted setup ceremonies to generate each of the key-pairs used for each of the protocols.

Sprout

For the Sprout circuit, there was a group of 6 participants that were located in different parts of the world and had to stay online for about three days while generating their random toxic waste in a round robin protocol, together with a coordinator. One can use the transcript of the ceremony to verify the trusted setup’s integrity and ensure the soundness of the scheme.

Interestingly, on February 5th, the Zcash Company published a report disclosing a meaningful vulnerability that was found almost a year ago by Ariel Gabizon on the [BCTV14] setup phase. Essentially, the issue was that there were extra elements in the SRS that were not needed and that actually enabled the prover to trick the verifier into unknowingly verifying a different statement. The attack would have allowed someone to forge new money (by faking the root of the Merkle tree or the nullifier), but the team updated the protocol to use a different and secure scheme, while also removing any backwards compatibility with the previous proving system.

On the left is the old scheme, on the right, the updated one (already available on eprint). I have highlighted in yellow the differences and on red the parts affected by the attack. Basically, the different pairing verifications are forced to pass as the values computed by the verifier (in green) are changed.

Crucially, this vulnerability is not inherent to preprocessing zk-SNARKs in general. It is caused by a local mistake in the paper, that has since been remedied in the updated version portrayed above. Hence why other proving systems, even without a pre-processing, are not inmune to these kind of vulnerabilites. It is important to mention that this vulnerability does not affect some existing systems:

The vulnerability is not present in the algorithms of [PGHR13] (which underlies [BCTV14]), nor in [BCGTV13], which used similar techniques. It is also not present in other zk-SNARKs constructions, such as [GM17] or [BG18], or in zero-knowledge proof systems that do not rely on a structured reference string. It is not present in libsnark when used with its built-in parameter generator.

In an article written not long ago, the author claims that:

When using a zero-knowledge proof protocol from the SNARK family, you never know the rules of the game. The rules are set by the participants of the procedure of system trusted parameters generation (“ceremony”), but after its completion it is impossible to check these rules. You can believe in correctness of the generation, but if you have not participated in it, you don’t have hundred percent guarantee.

Generally, this is simply not true; an outsider to the ceremony only needs to verify the transcript in order to ensure the integrity of the setup and to be sure of what exactly is being proven. One can be sure with high probability that at least one party was honest during the MPC (it could also be oneself). Of course, one needs to also understand and audit the mathematics of the scheme, and the code implementing it, or believe that scientists have done it. One of the reasons the ZKProof effort exists is to foster trust in a technology that is not easy to understand.

Sapling

For the Sapling circuit, the Zcash team designed a practical MPC for the setup of the [Groth16] SNARK (which is safe against the previous attack). Last year, the Zcash Foundation conducted the ceremony in two phases, as described by the paper, first the “Powers of Tau” and then the circuit-dependent MPC key generation. In short, the first ceremony generates most of the random toxic waste, independently of the circuit to be used, which enables re-usability of the powers of tau. There were more than 80 participants adding their share of the toxic waste, one after the other, with different settings.

This is the output of the powers of tau (or powers of ‘x’) computation, a set of random elements to be used in any circuit of a specific size, n. See the paper for more details.

Each of the participants wrote an attestation describing what was done during their computation. Myself (attestation 40), I actually revealed my own individual randomness and did not take any other security measurements. This was to show the robustness of the semantic security of the scheme; that only one out of the 80 needed to be honest while securing their environment and destroying the randomness to make it irrecoverable. Andrew Miller and Ryan Pierce actually used nuclear decay as a source of randomness. One can also check the integrity of this ceremony by verifying the response files in the attestations, using this code.

A tweet by Zooko mentioning three attestations of the powers of tau ceremony.

Subversion Resistance

As I said at the beginning, ZKPs are very sophisticated, to the point that even the parameter generation has several definitions of security. Suppose that the generation of the structured reference string during the setup phase is compromised. Perhaps it was badly formatted; or perhaps all the toxic waste leaked. What are the security implications? Bellare et al. [BFS16] studied this systematically, and defined the following (here informally):

  • Subversion soundness resistance (S-SND), which means that the proof scheme is such that there is no way (even in theory) to generate a backdoored SRS that would allow creating fake proofs.
  • Subversion zero-knowledge resistance (S-ZK), which is the fact that there is no way to generate a malicious SRS that would cause provers to inadvertently leak information to a verifier about their private inputs to the statement (violating zero-knowledge).

In general, it is known that no scheme can achieve both subversion soundness and subversion zero-knowledge. The figure below shows further some of these.

Note that schemes can be constructed to abide by these definitions. For example, the construction used in Zcash, [Groth16], is S-ZK but is not S-SND. For this reason, a prover in posession of a trapdoor could create fake proofs. In no case, however, for schemes that are S-ZK, can the trapdoor help a verifier reveal the private data of the prover. The trade-off for using a structured pre-processing SNARK is not at the expense of the privacy of the system and the prover.

In order ensure ensure Zcash’s integrity, there are specific elements added to the SRS of both [BCTV14] and [Groth16] for verifiability of the MPC, as explained in [Fuchsbauer17, ABLZ17, CGGN17]. Furthermore, a special-purpose zero-knowledge proof is used to prove each party computed its share correctly. These proofs are part of the transcript and the verification process.

You can watch Steven Goldfeder speak at Real World Crypto 2018 about subversion of the SRS for contigency payments on bitcoin.

The Real Trade-Off

The real issue with the pre-processing model is that, in general, one needs to compute a new pair of keys for every new relation that is used. This can be difficult when deploying and upgrading systems, and is specially problematic when many different circuits are used or are constantly being updated. There exist some interesting approaches to solve this issue. Essentially what we would like is some form of universal setup, that computes one set of parameters that suffices for proving all relations.

This problem has been tackled in different ways, starting with the SRS produced in [BCGTV13], which is universal up to a given size, and the SRS of [BCTV14], which is actually fully universal. Both use the TinyRAM architecture to achieve universality, at the cost of efficiency because of the CPU emulation overhead. Succinctness comes from having a short and basic program execute for many steps, e.g.: loops, operations, etc.

Another solution, a project led by Abel Adary which we have worked on at QED-it, is to use what are called Universal Circuits (UC) to generate the structured setup. A UC can embed any circuit up to a fix sized, representing any proving relation of that size. This means that when generating the key-pair, one can re-use these to prove and verify any statement within that limit. In theory, this solution works great, but it has a large efficiency overhead. You can check out my talk at the ZK-TLV Meetup.

The third existing solution is very promising. It is based on what is called the updateable CRS, introduced by [GKMKM18]. This model allows for a structured CRS to be updated dynamically, even after generating some keys. The CRS is also universal, so an ongoing setup can be used for any relation needed. The work mentioned above comes with a quadratic overhead in the size of the SRS, which is not great, yet achieves succinctness because of the pre-processing.

Recently, Sonic was published, a new zk-SNARK with a universal and updateable CRS that builds upon the previous work. In this case, though, the SRS generated is only linear in the size of the relation, which brings this to an efficient level. The best part is that the authors found a pretty good trade-off in terms of efficiency for the prover and verifier running time; and they can make the proofs succinct as a result of using pre-processing.

There are already implementations underway, and this one, which is very exciting. It is recommended and encouraged to review these papers and audit the implementations to ensure the security of the systems using them.

Conclusion

As we have seen, in general, succinct zero-knowledge proofs require a setup stage, in the form of trusted setup or a multiparty ceremony. The exceptions are where the statement being proven is small enough that succinctness is not an issue, or it is of a form that can be written down compactly (e.g., when it’s highly repetitive). There are many nuances to doing this setup in a way that maximizes security and efficiency, and minimizes trust and failure modes. This is an active area of rapid research, and we are working to improve and foster the state of the art in such zero-knowledge proof technology, while ensuring its correct usage.


I want to thank Kobi Gurkan and Eran Tromer for the great feedback, suggestions and the fruitful discussions.