Building zk-SNARKs (volume 2)

zkSNARKs for polynomials, step by step

--

The author thanks Mak and Unsplash for the image

Introduction

In our previous post, we introduced the concept of SNARK and the basic properties required. With the main objective of this series of posts in mind, namely: how to build zk-SNARKs for any computation, we introduce in this post the creation of a SNARK to prove knowledge of a polynomial. We will do it step by step departing from one construction which will not meet all the requirements and ending a protocol which will be a “full” zero-knowledge SNARK.

ZK-SNARKs for polynomials

We will build a zk-SNARK step by step, starting from a simpler construction which is not a SNARK, and then improving it with new techniques until we meet our objectives.

First step: the Schwartz — Zippel lemma

We want to create a protocol allowing a prover P to convince a verifier V of the knowledge of a particular polynomial of degree n with coefficients in a finite field F. In other words, P wants to convince V that he knows a polynomial of the form

Let us assume that a_1, a_2, … , a_n ∈ F are the n roots of this polynomial, therefore p(x) can be written as

Let us assume that V knows d < n roots of the polynomial p(x).

Then we can reformulate our problem: now P wants to prove to V that he knows a polynomial h(x) such that

The polynomial t(x) will be called the target polynomial. Observe that it has the form

The first version of our zk-SNARK is based on the Schwartz — Zippel lemma: let F be a field and f: F^m → F a multivariable polynomial of degree n. Then, for any finite set S ⊆ F:

What the lemma shows is that if we take u ∈ Sm uniformly at random, the probability of u being a set of roots for f is at most n / |S|. Observe that this probability is tiny if we consider S to be the finite field F. The field size would be very big compared to n.

It may seem contradictory but this lemma allows us to prove knowledge of a polynomial f. We do not need to prove that we know all coefficients of p but only that we know how to compute the pair (s, f(s)) for any s ∈ Fm. The Schwartz — Zippel lemma provides the efficiency required in our zk-SNARK.

First protocol

We recall that P knows the polynomial p(x) and V knows d < n roots of p(x) or, equivalently, V knows the target polynomial t(x). P also knows t(x).

Our first approach is as follows:

  1. V takes a random value s and computes t = t(s). V sends s to P.
  2. P computes the polynomial

3. P then computes the values h = h(s) and p = p(s) and sends the pair (h, p) to V.

4. V checks whether the equality p = t · h.

This protocol is correct since if P knows the polynomial p(x), he can compute h(x) and therefore P can also compute the values h and p such that p = h · t. On the other hand, this protocol is also efficient, due to the Schwartz — Zippel lemma.

Nevertheless, this protocol is not robust as the prover can compute t = t(s) using the target polynomial, even if P doesn’t know p(x),. He can take a random h and compute p = h · t and send the pair (h, p) to V who would accept the pair as valid.

We observe that the critical point allowing P to fool V is that P knows the evaluation point s. This trick would be impossible if P could evaluate p without knowing the evaluation point s. This leads us to the second step: homomorphic obfuscation, or homomorphic hiding.

Second step: homomorphic obfuscation

To make our protocol robust, we need to be able to evaluate a polynomial on a point, without knowing that point.

We will do so relying on the hardness of the discrete logarithm problem. For a classical computer, the discrete logarithm problem is computationally infeasible: in a cyclic group G, of order p and generated by an element g, determining an element a such that 𝛼 = ga (mod p) for a known 𝛼.

So assuming the hardness of the discrete logarithm polynomial, we can obfuscate a value a by computing 𝛼 = ga (mod p). A receiver of 𝛼 alone can’t learn the value a. The interesting point of this technique is that exponentiation has some homomorphic properties:

The product of two obfuscated values is the same as the obfuscation of the addition of the associated values in clear.

If we want to evaluate a polynomial

on a point x = s without letting the evaluator know the exact point, we need to obfuscate the whole polynomial:

We also need to compute the powers, from 1 to the degree of the polynomial, of all obfuscated values for x = r, that is:

Observe that with all these elements, we can now compute the obfuscated evaluation of the polynomial p on a point x =r. Indeed:

An example of homomorphic hiding

Let’s consider the finite field F = ℤ127 and g = 3 a generator. Let’s assume that P knows the polynomial p(x) = 4x2 + 3x + 1 and that V wants P to evaluate the polynomial on the point x = 2 without letting P know the point. We can do it through the following steps:

  1. V computes all the powers of x = 2 from 0 to the degree of the polynomial, n = 2:

1.a. 31 (mod 127) = 3

1.b. 32 (mod 127) = 9

1.c. 34 (mod 127) = 81

and sends the pair (9, 81) to P.

2. P can compute the evaluation of p(x) on x = 2 without knowing the value of x, indeed, using the information received from V he can compute:

Second protocol

Now that we know how to obfuscate points so the prover can evaluate the polynomial on that obfuscated point, we will improve the first protocol. We recall that P knows the polynomial p(x) and V knows d < n roots of p(x) or, equivalently, V knows the target polynomial t(x). P also knows t(x).

Our second protocol is as follows:

  1. V takes a random value s and computes t = t(s).
  2. V computes and sends to P the following powers:

3. P computes the polynomial

4. Using the values 𝔊, P computes g^p = g^{p(s)} and g^h = g*{h(s)} using the instructions of the previous example.

5. P sends g^p and g^h to V.

6. V checks whether the following identity holds: g^p = (g^h)^t

Observe that the flaw detected in the first protocol has been amended, but this second protocol is not robust yet. Indeed P can still use the values zp and zh such that z_p = (z_h)^t and send them to V as if they were g^p and g^h. P could do so by taking a random r, computing z_h = g^r and setting z_p = (g^t)^r. The value z_p can be computed using the obfuscated powers sent by V.

In order to avoid this situation, we need to make sure that g^p and g^h are computed using only the obfuscated powers sent by V.

Third step: the knowledge-of-exponent assumption

There is one common assumption when defining a SNARK: let us consider a prime number q such that 2q + 1 is also a prime number. Let us consider g a generator of ℤ_{2q + 1}. Given q, g and g’ = g^𝛼, we want to find a pair (x, y) such that x ≠ g, y ≠ g’, and y = x^𝛼. The knowledge-of-exponent assumption (KEA) says that the only way to find the pair (x, y) is by taking a value c ∈ ℤ_q, computing first x = g^c and then taking y = (g’)^c.

The KEA means that if one wants to get the pair (x, y), the only way to do it is by knowing an exponent c such that x = g^c. In other words, we only can get the pair (x, y) with the KEA property by using powers of g.

Using the KEA the protocol below ensures that P returns a particular power of g, a generator of a multiplicative subgroup, delivered by V:

  1. V takes a random value 𝛼 and computes g’ = g^𝛼 (mod 2q + 1).
  2. V sends the pair (g, g’) to P and asks for a pair (b, b’) such that (b, b’) = (g, g’).
  3. P takes a value c and computes b = g^c (mod 2q + 1), b’ = (g’)^c (mod 2q + 1) and sends the pair (b, b’) to V.
  4. V checks whether b’ = b^𝛼 (mod 2q + 1).

We observe that assuming the discrete logarithm problem is hard, P cannot find the value of 𝛼 from the pair (g, g’). Therefore, the only way to get 𝛼 is by computing all powers of g, as indicated by KEA. Furthermore, the above protocol is correct since

P used the same power c for both g and g’ and he was only able to use values delivered by V.

Third protocol

We are ready to build a proper protocol which will ensure that prover P will output powers of the value s on the obfuscated domain.

  1. V takes a random value s and computes t = t(s).
  2. V takes a random value 𝛼 and computes t(𝛼 · s).
  3. V computes the following series of obfuscated values 𝔊 and sends them to P.
  4. V computes the following series of obfuscated values

and sends them to P.

5. P computes the polynomial h(x).

6. Using 𝔊, P computes p(s) and h(s) together with g^p = g^{p(s)} and g^h = g^{h(s)}.

7. Using 𝛼𝔊, P computes p(s) and h(s) together with g^{p’} = g^{p(𝛼 · s)}.

8. P sends g^p, g^h and g^{p’} to V.

9. V checks that P made the computations of the obfuscated values using only the information he sent to him. To do so, V checks whether g^p = g^{p’}.

10. V checks whether the following identity holds: g^p = (g^h)^{t(s)}.

This protocol now satisfies the properties required by a SNARK: it is correct, it is robust and it is efficient. The next sections will deal with non-interactivity and zero-knowledge.

Remark: Steps 6 and 7 in the above protocol are done according to the example on homomorphic hiding.

Zero-knowledge

In the protocol we presented so far, the coefficients ci of the polynomial known to P are very specific, and V could try to learn some information about them using gp, gh and gp’ coming from P. Therefore, in order to make P sure that V will not learn anything, P needs to hide these values.

The strategy to hide the values sent by P follows the steps used previously: P takes a random 𝛿 and uses it to hide the values sent in the communication with V. To be precise, instead of sending g^p, g^h and g^{p’}, P computes and send (g^p)^𝛿, (g^h)^𝛿 and (g^{p’})^𝛿. Then V runs the verification on the values hidden under 𝛿.

The verification procedure that V need to run is still valid with this hiding, indeed:

So to make the third protocol zero-knowledge, one replaces step 8 so P can send the obfuscated values.

Non-interactivity

The different protocols we have presented require the interaction between the prover and the verifier, and this is because the correctness of the protocols relies on the secret values s and 𝛼 chosen by the verifier.

If we want to repeat any of the above protocols with another verifier V’, V’ would need to choose new values s’ and 𝛼’. Using the values generated by V may allow P to cheat if he colludes with V and V reveals the values s and 𝛼 to P.

To avoid the above situation, we need our protocol to be non-interactive and we can achieve it by using parameters which are public, trusted and reusable. This can be accomplished by obfuscating these parameters using a generator g. Nevertheless, even if the obfuscation techniques used are homomorphic, they only allow the addition of clear messages using the product of hidden values, but they don’t allow performing the product of clear values in the obfuscated domain, and this step is crucial when verifying the correctness of the polynomial and its roots.

We will introduce pairings to solve this problem. Let’s recall that a pairing has the following property:

This property enables checking for the equality of products in the obfuscated domain.

So in order to make our protocols non-interactive, the first step is to take the secret random values s and 𝛼 and obfuscate them: g^𝛼, 𝔊 and 𝛼𝔊.

Once we compute these powers, we can get rid of the secret values s and 𝛼 and broadcast their obfuscated values, known as Common Reference String (CRS). The CRS values are usually stored as:

  1. Evaluation key: (𝔊, 𝛼𝔊).
  2. Verification key: (g^𝛼, g^{t(s)}).

The prover P uses the values of the proof key for evaluating the polynomial in the obfuscated domain whereas the verifier V uses the verification key to check whether the evaluation of the polynomial in g^p, g^h, g^{p’} satisfies:

Final protocol: zk-SNARK for the knowledge of a polynomial

We now define a zk-SNARK to prove knowledge about a polynomial p(x) of degree n given a target polynomial t(x).

Parameter setting

  1. Each party takes two secret values s and 𝛼 at random.
  2. Each party computes g𝛼, 𝔊 and 𝛼𝔊.
  3. The values s and 𝛼 are destroyed.
  4. One broadcasts the evaluation key (𝔊 and 𝛼𝔊).
  5. One broadcasts the verification key g^𝛼, g^{t(s)}.

Proof

  1. Assuming that P knows p(x), P computes the polynomial h(x).
  2. P evaluates p(x) and h(x) and computes gp(s) and gh(s) using the values contained in the evaluation key.
  3. P computes the value g^{p(𝛼 · s)} using the evaluation key.
  4. P takes a random value 𝛿.
  5. P broadcasts the proof π = (g^{𝛿 · p(s)}, g^{𝛿 · h(s)}, g^{𝛿 · p(𝛼 · s)}) = (g^{𝛿 · p}, g^{𝛿 · h}, g^{𝛿 · p’}).

Verification

Assuming that V has access to π = (g^{𝛿 · p}, g^{𝛿 · h}, g^{𝛿 · p’}), it checks whether

We managed to set a protocol satisfying all the properties required in the definition of a zk-SNARK: succinctness (since the proof is just checking a polynomial at a point), zero-knowledge and non-interactive (since it makes use of a CRS).

Generation of the CRS

Zk-SNARKs can be made non-interactive by using a set of hidden values, known as the Common Reference String (CRS). These hidden values, which in our case are of the form 𝔊 and 𝛼𝔊, come from secret values, s and 𝛼, which must be destroyed once the hidden values have been derived. Observe that the generation of the CRS is critical since if one delegates its generation to a third participant, everyone needs to trust that the participant will destroy the values s and 𝛼. The third party represents a single point of failure.

If we want to avoid the single point of failure we need a distributed way to generate the CRS in which a single honest participant is enough to guarantee that the values s and 𝛼 cannot be recovered.

Let’s use the CRS for the SNARK we constructed in our previous example. We recall that using s and 𝛼 as secrets, we need to compute the hidden values and powers g^𝛼, 𝔊 and 𝛼𝔊.

In our new protocol, we will replace the values s and 𝛼, generated by a single third party, with values s_{ABC} and 𝛼_{ABC} generated by three users A, B and C. Extending the mechanism to n users is straightforward.

The protocol follows:

  1. User A generates the pair (sA, 𝛼A), computes and broadcasts:

2. User B generates the pair (sB, 𝛼B), computes and broadcasts:

3. B broadcasts:

4. C takes a random pair (sC, 𝛼C), computes and broadcasts:

This protocol generates the hidden values for s_{ABC} = s_A · s_B · s_C and 𝛼_{ABC} = 𝛼_A · 𝛼_B · 𝛼_C with no single participant knowing this pair of values.

Nevertheless, we need a protocol that allows the participants to check that those hidden values were computed correctly. We need to ensure that the values generated by each user satisfy the requirements, that is the set of values 𝔊 are powers of gs, for the s-value of each user, and that the set 𝛼𝔊 was computed correctly. To do so, we check that each (g^s)^i is effectively the i-th power of gs. This can be done by checking that the following equalities hold. This check is performed by each participant for all values sU and 𝛼U associated with each participant U:

This is not the only verification required. We also need to check that each participant uses the right values from the previous participant. To do it, each participant will use hidden public parameters from one user combined with the outputs of another participant. For example, C may check the intermediate values of the CRS generated by B, using the CRS information from A, by verifying the following:

Conclusion

So far we have learnt how to create a SNARK, from scratch and with full details, to prove knowledge of a polynomial. In our next post, the last one, we will extend the above construction to any computation. To do so we will introduce two key concepts: quadratic arithmetic programs, QAPs, and rank-1 constraint systems, R1CS, although the latter will not be introduced specifically (it will appear as a subliminal message).

References

Jordi Herrera Joancomartí, Cristina Pérez Solà, Criptografia avançada. Lecture notes. Open University of Catalonia, 2022.

--

--