Privacy-Preserving KYC

5 min readJul 4, 2024

This work was supported by a grant from the ZK Grants Round funded by Aztec, Ethereum Foundation, Polygon, Scroll, Taiko, and Zk Sync.

Introduction

Parents say “don’t talk to strangers” — well you shouldn’t share personal data with them either. The more data you share, the greater your risk of exposure to a data breach. Most websites don’t care about who you are. Don’t take it personally, they just want to check you’re allowed to access their content, i.e. that you passed their Know Your Customer (KYC) process. Sharing a proof that you passed such a KYC process should be enough. But how can one build such a proof?

Zero-knowledge proofs

Zero-knowledge proofs solve this problem. This cryptography concept enables sharing and verifying, a proof that a computation is valid without providing the details of the computation. Here’s a high-level overview:

1. Two people, the prover and the verifier, agree a) on an arithmetic circuit — often simply named circuit — that represents the computation to check, and b) on public inputs of the computation.

2. The prover knows private inputs — i.e. the detail of the computation — that make the computation valid and uses them to generate a proof.

3. The verifier is able to check the validity of the computation based on the public inputs, the circuit and the proof.

Workflow to generate and verify zero-knowledge proofs

Think about playing sudoku with a friend. You found a solution, you want to let your friend know, but don’t want to spoil her/him with the solution. To that end, a) you (the prover here) and your friend (the verifier) could agree on a circuit, representing the computation, to check sudoku solution’s validity, and initial public inputs, b) you can then input your solution — i.e. private inputs — and generate a proof, and c) finally share it with your friend. Your friend can then verify the solution’s validity and be mathematically convinced that you were not bluffing.

Now, going back to the KYC example, the challenge is to: a) represent every check (age > 18, country doesn’t belong to a blocklist, etc.) with a computation and b) to ensure that the data used to generate the proof comes from a valid data source. We’ll cover the former in this blog post, and the latter in future publications.

Noir language

Noir is a Domain Specific Language for zero-knowledge proving systems, especially used for SNARKs. It enables verifying a computation without revealing the computation details, with substantially lower computational complexity. Noir enables writing zero-knowledge programs with a simple syntax, requiring no knowledge of underlying mathematics nor cryptography. Noir programs are then compiled into circuits. We can think of a circuit as an electric circuit, with sums and products instead of electronic gates. Proving and verifying SNARK circuits is not the same as building SNARK circuits, gate-by-gate. It would be like writing a program by hand-etching transistors on silicon. Noir totally abstracts this complex layer, this is why it’s so interesting.

In the context of SNARKs, a frontend layer compiles a circuit to an Intermediate Representation (IR) optimized for SNARK processing. A backend Layer takes this IR as input, and provides the concrete implementation of the proof system.

Noir is backend agnostic, meaning you can plug any SNARK-based backend to prove and verify computations, thanks to Noir’s IR called ACIR (Abstract Circuit Intermediate Representation). A simplified workflow is as follows:

Noir program is compiled to ACIR in the frontend.
The ACIR instance is then converted to the required format by a backend. This is done at the ACVM level (Abstract Circuit Virtual Machine).
The backend generates and verifies proofs.

Existing proving systems

Several backend implementations currently exist. The most popular one would be the Barretenberg proving system: albeit minimally documented, it is commonly used by Noir developers and is frequently updated. It is anticipated that teams will write their own proving system and integrate it to Noir. We can distinguish Plonkish — based on primitives from Plonk protocol — implementations (like Plonky2, Halo2) from non-Plonkish implementations (like Groth16). Current interesting implementations are: Plonky2 (article, code1, code2), Halo2 (article, code), Groth16 (article, code), Marlin (article, code), Gnark (article, code). Generally speaking, such references are made available in the “Proving Backends” section of awesome-noir repo, here.

Back to verifiable KYC

We want to enable passing the KYC process without sharing personal data. Let’s focus in this section on a few examples of Noir programs that can be useful in that context. Note that variables are private by default — like age, country and salary below — and are public when one adds the keyword pub. Namely, we show Noir code to check that:

Age is above a given threshold:

A country does not belong to a blocklist of countries:

Each monthly salary is above a given threshold, over the last 12 months:

Other examples can be found in Noir examples repo.

Conclusion

The approach above enables sharing proof that you passed a KYC process, without revealing sensitive information, mitigating the risks associated with data breaches. This zero-knowledge approach assumes a) that the prover and the verifier agree that KYC checks can be represented by a set of computations, and b) that the verifier can verify those proofs when needed. That way, the verifier doesn’t hold any user data, but can check that he/she is allowed to access the content.

As next steps, we’ll share code for a fully-fledged zk-app and insights to minimize trust assumptions regarding private inputs integrity. First, we’ll open-source the codebase of an app to generate and verify KYC proofs, based on Noir and NoirJS. Then, we’ll ensure that private inputs come from a valid data source, and introduce frameworks with Noir to address this data provenance topic.

Stay 🔌