# Giving Extractors what they need

The K in SNARKs tells us — the statement is not just true..the prover in fact knows a witness for its validity! And this K is not just an extra perk for fun, it is essential for security analysis of systems like Zcash or Zerocash — otherwise we would have a system where anybody can spend anybody’s money :) as you would just be proving the input note is valid, but not that it is *yours* in the sense of *you* knowing a secret key associated with it.

But how do we translate “knows” into precise mathematical terms? This is where *extractors* come into the picture. Simplifying a bit, they allow us to formally define “to know” as follows:

“For any prover algorithm **P, **there exists an extractor algorithm **E **such that whenever **P **outputs a valid proof for the statement ‘There exists **y **such that **F(x,y)=0**’** **, *given **F**,**x** and the inner state of **P**,*** E **produces the witness

**y**”

What is the “inner state” of **P**? Since **E**’s definition can *depend *on **P**, **E **can be assumed to know everything about **P **except its randomness on that run, so we can replace “inner state” by “randomness on that execution”.

The headache, and potentially subtle proof bugs, however, lie sometimes in a seemingly overly-philosophical question, of… who actually is **P**?

Let me give the exact example I ran into recently. Suppose **P **is a prover trying to make a SNARK proof for some statement, but while doing this, he has access to a signature oracle **O **giving him signatures on various messages of his choice.

After this interaction with **O**, **P **spits out a SNARK proof **π**. Now, suppose we want to use an extractor **E **to obtain a witness **w** for the statement..

what do we need to give **E**? *The party that generated the proof **π **is in fact **O **and** P **together.*

Thus, to apply the knowledge-soundness property of the SNARK

**E**’s definition as an algorithm needs to*depend*also on**O**.**E**needs access also to the inner randomness of**O**!

This we often *do *mind; reason being that in some security proofs obtaining **w** is not an end in itself; but a *means *to do something else that contradicts some cryptographic assumption — something that perhaps is indeed contradicting if **P **(or someone with access only to **P**) does by itself; *but not contradicting if **P **and **E **do it together!*

For example, the contradiction may be obtaining a new signature that we haven’t seen before from **O**. This is not much of a contradiction if we do it while given access to the inner state of **O**, containing in particular the secret signing key :)

The correct resolution to this issue that I found comes in two parts:

The first is kind of similar to the simulation principle for proving zero-knowledge. The party **A **using **P **to try to get the signature forgery should be able to simulate itself part of the interaction with **O**, such that **P **won’t be able to tell the difference. If it can do that we can give **E A**’s randomness instead of **O**’s. Let’s see how this can be done in our example.

If the signature scheme is the Schnorr scheme, then we know given a public key **pk**, we can simulate signatures with the corresponding secret key, if we have control over the random oracle that is used for getting the challenges. So given **pk**, **A **will simulate the signatures requested by **P **and the needed random oracle calls to verify them. More precisely, **A **here “moderates” **P**’s access to the random oracle **R **by updating an initially empty table T of simulated answers: if **P **queries the **R** at a point that isn’t defined in T, it will get the real oracle value; but while simulating a signature, **A **will set T at the point **R **should be queried at for that signature’s verification, in a way that verification passes, and respond later to queries on that point according to T instead of **R**. In this kind of scheme, there is a negligible chance that **P **will ask a random oracle query in advance to **R **that we also want to set later to another value in T as part of simulating a signature, in which case we abort.

But still there is the issue of where to get **pk**. We really want to get it from **O**, and not just be simulated by **A**, cause our point is to show **P **can be used to get a genuine signature forgery for a challenge **pk **from an *external* challenger.

So at this point we are left we an external party **O**, that only generates the challenge **pk **(as from that point **A **simulates all of **O**’s messages). And **E **will need **O**’s randomness to run as part of our forgery algorithm. This is the second part of our solution and has to do with *invertiblity. *Suppose that from the output **pk **we could derive the randomness **r **used by **O** in that execution, the we can give it to **E** without needing **O**’s help. If we thinking of sampling **pk **by first sampling the private key **sk** and then computing **pk **then we can’t do this. Fortunately, in the case of Schnorr with elliptic curve groups, there is a “weakly invertible” way of sampling a uniform group element **pk**, in the sense that after seeing **pk**, we can with some constant probability guess what randomness was used by **O**. If the use case is using **E **as a part of an algorithm that tries to output a forgery with non-negligible probability, this constant probability instead of probability one will be fine for us.

The way of sampling is: guess a random **x** in the field, check if there is some **(x,y) **on the curve and if so return one of the two such points randomly; if not, try again. The Hasse-Weil Theorem implies that for about 1/2 values of **x **from the field there will be a point **(x,y)** on the curve, and thus w.p. about 1/2 we succeed on the first iteration. Therefore given **(x,y)**, the guess of what randomness was used corresponding to succeeding on the first iteration, which will be **(x,sign(y))**, will be correct w.p. about 1/2.

*Thanks to Matthew Green for a conversation that inspired me to write this.*