A Basic Primer to How Zero Knowledge Proofs and zkSNARKs Work

Kareem Moussa
8 min readMar 24, 2019

--

In this post, we are going to explore Zero Knowledge proofs and a practical implementation of it, the zkSNARK. Zero Knowledge proofs are notorious for being difficult to understand, which is why I will try and explain it in layman’s terms!

So why should you learn about Zero Knowledge proofs? Well, Zero Knowledge proofs are incredibly powerful for privacy, and they can be very efficient to compute on. They let you prove that you know something, without revealing what that something is. This is done by performing some operation with that knowledge that proves that you definitely have knowledge of it.

This is widely used and being explored in the cryptocurrency space to provide privacy and proof of computation with unrevealed inputs. A great example is Zcash — it uses shielded transactions by providing proofs that the transaction is valid (sum of inputs = sum of outputs, etc) without revealing the addresses or amounts in the transaction.

In more mathematical terms, a Zero Knowledge proof is a proof that you know x for some function F such that F(x) = y. If you want to prove that you know x to somebody, you must provide F and y in a such way that does not reveal x.

Zero Knowledge Proofs have three main properties:

  • Zero Knowledge (obviously): The Verifier cannot extract x from F and y.
  • Soundness: If F(x) = y , then there is no x’ such that F(x’) = y. Therefore, if a Verifier is given F and y, it must be true that the Prover knows x.
  • Completeness: If the Prover knows x and is an honest actor, then they can convince the Verifier with that they know x, with a high enough probability for practical use cases.

The simplest form of a Zero Knowledge Proof is proving that you have some password by hashing it and providing the hash for others to verify — only the owner of that password could create that hash, assuming the hash function is a cryptographically secure trapdoor function (aka one-way function), meaning that given the output of the function, you cannot easily compute the input from it.

The beauty of zero knowledge proofs is that you can prove some very complex statements with it while providing privacy, as shown in the Zcash example.

In this post we are going to explore zkSNARKs, a very commonly deployed form of zero knowledge proofs today due to its practicality and efficiency. Hopefully by the end of this post, you will have enough knowledge to start creating your own proofs!

zkSNARKs

zkSNARKs are a form of zero knowledge proofs that are succinct and non-interactive. Succinct meaning that the proof is small and can be verified very quickly, and non-interactive meaning that the Prover and Verifier do not need to be online at the same time to communicate with each other, allowing the protocol to be more flexible / easier to deploy.

The inputs to a zkSNARK is some code that is ran (a function) and the secret inputs (x) that we do not want to reveal. Imagine we had a Python function that looks like this, where the function is f, and the secret input is x:

def f(x):
y = x ** 2 - 4
return y

And we wanted to prove that we know some secret value x = 2 , which results in y=0.

Remember, our goal of the zkSNARK is to do the following two things:

  • prove that we know the secret value x
  • prove that we correctly performed the computation of f on it

We can do this by creating a zkSNARK using the following pipeline:

zkSNARK pipeline from researcher Eran Tromer. Linked in Vitalik’s QAP post.

Computation -> Algebraic Circuit -> R1CS

Basically, we take the computation from the first step, and turn it into an algebraic circuit, which is just a formal algebraic representation of the code which looks like lines of machine code / byte code where each opcode has 3 operands — a,b, and c, and each operand is simple algebra (ie addition, subtraction, multiplication, etc).

The circuit for the example above would look like this:

Circuit from Decentriq’s blog post

Which can be represented in flattened code format as such:

def f(x):
out_1 = x * x
y = out_1 - 4

We then turn the algebraic circuit into an R1CS (rank 1 constraint system), which is just a system of linear equations — one equation (constraint) for each line of algebraic circuit code. The solution to the R1CS is some vector that encodes all the values of the variables in the equation.

But that vector contains the secrets that we cannot reveal! So we need a way to not reveal the secret (zero knowledge), while still proving that:

  • we know that secret value x=2
  • we ran the code on that secret value and got the output y=0

We can do this by encrypting the R1CS in a such way that does not reveal the secret, but can still prove that we ran the code and know the secret. So we need some form of encryption, and that happens to be Elliptic Curve Cryptography.

Elliptic Curve Cryptography

To understand how zkSNARKs, we must understand the basics of Elliptic Curve Cryptography (ECC). It is a form of asymmetric cryptography that is widely used in the cryptocurrency space due to its powerful properties. It is more efficient than RSA — for the same level of security, private keys / public keys are much smaller, and encryption / decryption operations are much faster.

ECC works by utilizing the powerful properties of elliptic curves, specifically that of the form:

y² = x³ + ax + b (mod p)

There are different curves that choose the variables a, b, and p, and which curve you use can be important for security, so be careful what curve you use, and only use approved standards.

In ECC, we derive a public / private key pair by performing the dot operation on an initial “generator” point on the curve on itself k times to get a resulting point P. P is the public key, and k is the private key. In this GIF, we show what a dot operation looks like with generator point A, with private key 4, resulting in public key E:

GIF from Cloudflare blog

This happens to be a cryptographically secure trapdoor function as well, which means that is exceedingly difficult to derive the private key k given public key P.

There are different algorithms to encrypt and decrypt with these keys, such as ECDSA, but we will not go into that detail here.

QAP -> Linear PCP -> Linear Interactive Proof -> zkSNARK!

Now we need to use ECC to encrypt the secrets in our R1CS. Remember, we are trying to prove two things here with zkSNARKs:

  • that you know some secret values (eg, x=2)
  • that you correctly ran some code on those secrets

We can achieve this by encrypting the inputs, and performing the calculations on them such that the result has the same structure as if we calculated it on the unencrypted inputs.

Using pure ECC, we would be able to encrypt inputs in simple linear equations (addition and subtraction) since it provides linearity, such that E(a+b) = E(a)+E(b), where E is encryption on an ECC curve.

But we need to be able to perform calculations on multiplication and division as well, since our computation circuits will definitely include that! This requires a special property of ECC called elliptic curve pairings, which allows us to perform quadratic calculations on encrypted data that preserves the structure of the data as if it were calculated on the unencrypted data. This is where the magic of zkSNARKs comes from.

Harry Potter using elliptic curve pairings to defeat Voldemort

Here’s a watered down example to help you understand: imagine we want to prove that n*m+n-5=0, where n and m are secrets we don’t want to reveal. So we calculate the following: encrypt(n)*encrypt(m)+encrypt(n)-encrypt(5)=encrypt(0), and provide this set of computation steps on encrypted inputs.

This happens to be sufficient for proving that we know the secrets and performed the calculations on it! Woohoo! That’s exactly what we needed.

That is how we will prove those computation steps were performed, without actually revealing the secrets in the steps — we need to encrypt the secrets in those steps, while still preserving the structure of the actual computations being performed. This is very similar to how homomorphic encryption works.

But this requires us to work with polynomials, which ECC is based off of. That is what the QAP (Quadratic Arithmetic Program) is — transforming the R1CS into polynomial form, that is all.

Lastly, the Linear PCP (probabilistically checkable proofs) and Linear Interactive Proof steps are about encrypting the secret inputs in the QAP as described above, and then validating the calculations performed on the encrypted secret inputs via elliptic curve pairings. The proof itself happens to be quite small in size and fast to verify, which makes it so practical.

Note that to encrypt the secrets in the computation steps in a secure way, a secret key is necessary for the Prover to use for encryption. This is one of the big downsides of zkSNARKs — it requires a “trusted” setup. Some trusted entity must do the following:

  • generate a proving key / verification key pair (both publicly available) from some secret variable “lambda”
  • throw away the secret variable (also known as toxic waste) so that no other entity can know of it, otherwise they could generate fake proofs from it.
  • provide the function F and its R1CS / QAP (are we proving what we think we are proving?)

If there is no such entity that can be trusted, then you may want to use zkSTARKs, which provide an improvement on top of zkSNARKs in that it is “trustless”.

That concludes this post. I hope you learned something out of it, and have enough knowledge to explore zkSNARKs some more! I suggest checking out the Zokrates project, where you can create a zkSNARK proof and put it in an Ethereum smart contract. Or the more primitive snark.js project to create computation circuits and their proofs.

If you want to learn more, check out these sources that I used to learn and create this post:

Demistifying Zero Knowledge Proofs by Elena Nadolinski

--

--