Technical Article #1: Oracle Nodes and zkCertificates
In June of 2021, Pieter Pauwels published a whitepaper: “zkKYC: A solution concept for KYC without knowing your customer, leveraging self-sovereign identity and zero-knowledge proofs”. In this important piece, he proposes a solution concept, zKYC, which “removes the need for the customer to share any personal information with a regulated business for the purpose of KYC, and yet provides the transparency to allow for a customer to be identified if and when that is ruled necessary by a designated governing entity (e.g. regulator, law enforcement).” His approach sidesteps the traditional arguments of privacy v. transparency tradeoffs and instead proposes a solution that is able to simultaneously incorporate both without any loss of effect for both.
Galactica Network has utilized Mr. Pauwels’ novel approach as the intellectual foundation for our development of zero-knowledge certificates (or zkCertificates for short) whereby zkKYC is a peculiar special case. In the design of Galactica Network we have also employed Oracle Nodes — initially curated and progressively more decentralized list of nodes capable of verifying the validity and veracity of real world documents thereafter cryptographically signing those submiting them. An important property of our design is the inability of Oracle Nodes to associate a given set of docs with the blockchain account that submitted them without consensus of a set of 3rd party nodes, such as those of Galactica Fondation or other governance bodies within the network.
Definitions and Concepts
We shall start by defining the various concepts that make up the Oracle Node mechanism. Oracle Nodes are built upon several fundamental mathematical and cryptographic concepts that one needs to be familiar with if they are to understand the underlying functionality.
Zero Knowledge Proofs
Zero-knowledge proofs (ZKPs) are a concept that is defined and expounded upon in many other articles and documents from Galactica Network, nevertheless, it is important to ensure the reader’s understanding at this point. In their barest form, ZKPs are a type of cryptography, which, when applied to a specific context of zkKYC, allows one to prove their identity or specific credentials to a second party (the verifier) without revealing any unnecessary information, thus maintaining the privacy of the prover. ZKPs are capable of proving a host of useful mathematical statements, in fact, any statement that can be rephrased as a solution to an NP-hard problem can be proved through use of ZKPs.
ZKPs can be broken down into interactive and non-interactive types; interactive ZKPs require the prover and verifier to communicate with each other in order to prove their statements (which isn’t scalable as you approach more than a handful of transactions). It is implied that when utilizing Interactive ZKPs, the communication channels used are secure and allow for the safe interaction of the prover and verifier. Non-interactive ZKPs were developed following Interactive ZKPs in the late 1980s, and do not require the prover and verifier to talk to each other for the transaction to conclude; the prover is able to authenticate the verifier without revealing any of the specific information of the transaction itself, beyond the validity of it. The cryptocurrency industry, including Galactica Network, by and large, utilizes the non-interactive form of ZKPs.
Other terms relevant to ZKPs are verifier and prover, the two parties present in a zero-knowledge transaction. The prover needs to confirm to the verifier that a statement or datum is true while simultaneously avoiding the conveyance of any further information aside from the statement’s inherent truth. The verifier must ensure that the statement presented is in fact true while blind to the information possessed by the prover (which proves that the statement is true). In the case of Galactica Network, the prover would be a user stating, for instance, that they’re from Country X with relevant documents proving this fact, and the verifier could be some DEX that needs to verify that the user is from a specific jurisdiction (Country X) while not actually knowing what that jurisdiction is. Furthermore, with ZKPs said user can prove to the DEX that he or she is from Country X without disclosing the specific address or any other details from their documents.
Of the many ZKP systems currently under discussion within the ZK space, one of the most relevant is the zkSNARK or zero-knowledge Succinct Non-interactive ARgument of Knowledge; a cryptographic tool for producing consistently sized proofs of statements without revealing any additional information. SNARKs can be verified almost instantaneously and without explicit interaction between the prover and verifier besides sending the ZKP once. These attributes make it useful in contexts that feature vast numbers of transactions, or ones where speed is mission-critical.
Now we move on to Merkle Trees or Binary Hash Trees, and their relevance, put succinctly: “A Merkle tree, also known as a hash tree, is a data structure used for data verification and synchronization. ”. Merkle trees are a data structure where each non-leaf node is a hash of its child nodes, and all leaf nodes in a Merkle tree are at the same depth and are as far left (on the tree) as possible. For those less acquainted with cryptography, hashes, or hash values, are the product of hash functions mapping data of arbitrary size and content onto fixed size values; good hash functions are simple to compute but very hard to reverse. With hashing in mind, Salting, or Random Salt, is an additional step that can occur during hashing — typically seen in association with hashed passwords — that adds an additional value to the end of the password, changing the hash value produced. An example of a Merkle tree is depicted below in Figure 1.
Figure 1: A simple Merkle tree.
Readers may be unaware of the intricacies of data structures, specifically trees, so here we’ll provide a brief explanation.
- A node is a structure that may contain data and connections to other nodes, sometimes called edges or links;
- Each node in a tree has zero or more child nodes, which are below it in the tree (by convention, trees are drawn with descendants going downwards);
- A node that has a child is called the child’s parent node;
- All nodes have exactly one parent, except the topmost root node, which has none;
- A node might have many ancestor nodes, such as the parent’s parent;
- Typically siblings have an order, with the first one conventionally drawn on the left.”;
- A node with no descendants, that is, it is the bottommost descendant, is known as the leaf.
An example of the Tree data structure is depicted below.
Figure 2: A simple tree.
Merkle trees are particularly useful due to several unique properties they possess, for example, a Merkle tree is often used for proving membership in a set, because it can hold huge amounts of data, and a verifier simply needs to store the top hash of the leaf for verification. Merkle proofs show that a data block is part of the tree by displaying the path of hash computations from his data block (a leaf) to the top hash. Because hash functions are hard to reverse, this is easy to compute and verify, but hard to fake. When combining Merkle trees and ZKPs a great deal of privacy is gained; the prover can include the Merkle proof in the ZKP to show that he knows a data block within the tree without revealing which one it is.
KYC Record Creation
With our understanding of zero knowledge and other pertinent terms out of the way, we can now move on to discussing exactly what a KYC record on Galactica Network is, and how it is created.
First and foremost, a KYC record contains a user’s Public Key (wallet), Date of Birth, Country of Origin, Full Name, Verification Level (more on that later), Random Salt, and other personal information. The KYC record is the most important item to secure in the Oracle Node process as its existence is what the prover (the user) must prove to the verifier (KYC provider or smart contracts) that they possess valid KYC, without actually revealing the information contained therein.
The KYC record is hashed with the zk-friendly hash function Poseidon because it is efficient to compute in SNARK circuits, with the computations costing less gas in an EVM environment like Galactica Network’s than alternative zk-friendly-hashing functions. The KYC record is submitted to the KYC provider as a request with a commitment hash (generated by the Poseidon function), the verifier views the request and then signs the commitment hash with their own private key adding the complete (verified) record as a leaf of the Merkle tree in Galactica Network’s Oracle Node registry smart contract.
The leaf added to the Merkle tree contains the hashed KYC information, the commitment hash, and the verifier’s signature (private key). This leaf also becomes an SBT for the respective user. Following the smart contract’s appending of the leaf to the Merkle tree, the KYC provider returns the complete SBT (which now has the verifier’s signature) to the user who submitted the original KYC request, who can now use their SBT to create ZKPs that rely upon their KYC information.
It should be noted that during the ongoing development of the Oracle Node feature, and the KYC record smart contract holding the Merkle tree, two methods were investigated. The first method had only the Merkle root stored on-chain and both the root transition and KYC record validity were to be verified by a ZKP.
In the second method an incremental Merkle tree, including a filled subtree alongside the Merkle root, is stored on-chain, thus making the appending of a new leaf independent of the current Merkle root possible. This method could use a ZKP to verify the KYC record, but because the KYC provider is verifying off-chain, and the on-chain consistency is checked regardless via the ZKP generated by the user later, it may well be redundant. Presently, Galactica Network has implemented the second method to permit multiple KYC providers to work concurrently.
The verifier (on-chain service requiring the user to prove possession of valid KYC — the Oracle Nodes) will also post the encrypted KYC record on-chain. This information can be decrypted using 𝐊 out of Ｎ governance notes (Shamir’s secret sharing scheme) if necessary — the reason this decryption method exists is for regulatory compliance to allow for fraud investigation regarding AML/CTF. Lastly, in the Merkle tree smart contract, each KYC record will have an expiration date, and whenever one generates a ZKP for the record, the expiration date will be checked. For zkCertificates a simpler nullifier mapping will be used to record the valid hash of the item, as it can be revoked or modified. In that case, the old KYC record is still retained in the Merkle tree, however, it is marked as false in the nullifier mapping and is longer valid and acceptable and the user must then obtain a new KYC record.
Figure 3: KYC record creation, verification, and dApp interaction
Studying Figure 3 above, we gain a clearer understanding of the various processes involved for a user wishing to perform KYC requests and verification on the Galactica Network.
Our theoretical user’s initial interaction is with Galactica Network’s KYC Portal website, where they will request a signed commitment hash to use in lieu of their on-chain address when dealing with their chosen KYC provider. The user will then submit a KYC application to the provider, using valid documentation to pass their specific processes.
Once again we note that due to the on-chain address being signed and hashed, KYC providers can never associate a user’s off-chain records with their on-chain address.
On completion of their internal processes and the user’s records being validated, the KYC provider then mints the relevant Oracle Node SBTs — interacting with the Galactica Network’s Oracle Node registry smart contract as they do so.
The user, having passed KYC and now having on-chain proof of such, can now interact with dApps and services that require it. When in use by the user the dApp in question will verify the relevant proofs with the Oracle Node registry smart contract and the user’s wallet before providing their services.
More on Proofs
In the above section we’ve explained the process of KYC record creation, and now we’ll include two important proofs that are required for KYC’d users to interact with protocols and DApps.
The ‘membership’ and ‘condition’ proofs, when supplied, are verified by protocols and DApps, which are then likely to be combined into a single proof requiring only one item to be submitted through their own respective verifiers, and only proceed with the transaction or interaction once both proofs pass. These protocols and DApps can deploy the verifiers themselves if they require custom conditions or they can use verifiers that are already public if they need more common information such as a user’s country of origin.
These proofs are the first half of the proofs required of KYC’d users to interact with specific protocols and DApps, and they demonstrate that the user has a valid record within the Merkle tree. The public inputs are part of the ZKP and one field of these public inputs is the Merkle root, which is stored on-chain alongside the user’s address — the smart contract checks that it matches the one stored on-chain.
The private inputs are the Merkle path and KYC record information that — along with the user’s address — hashes to the corresponding Merkle leaf. It has to be verified that the user’s address is in fact the one in the KYC record** and that the Merkle path is valid — that is, it subsequently hashes to the correct Merkle root stored on-chain. Another check must occur to ensure that the KYC record currently being checked is the most recent, up-to-date KYC record and has not been discarded by checking the expiration date (mentioned above in the Further Details section). The ZKP proves that the KYC did not expire on the checked date while the verifier smart contract checks that the public input for the checked date matches the date on-chain. For zkCertificates, similar checks are performed using the Merkle tree’s nullifier mappings.
**With regards to the address present in the KYC record, the development team, in an effort to improve privacy, made it such that a user can have multiple addresses to prevent profiling across all the items (protocols, etc) where they’ve proven their KYC. The address listed in the KYC record is the “holding” address. Others are “using” addresses, which can be viewed as proxies to the “holding” address, they are connected by the user being able to sign the using address with the holding keys — this connection is validated inside the ZKP.
We can understand this setup with an example:
Alice can use multiple accounts in Metamask:
- 0xA1 holds the KYC SBT
- 0xB2 is used for service X
- 0xC3 is used for Y
Before using service X, Alice creates the Oracle Node proof. It has the signature of 0xA1 as private input to prove that she is the holder. It publicly discloses that 0xB2 is KYC’ed to use X and that the holder is from Australia. 0xA1 is not publicly revealed here.
Condition proofs are the second half of the proofs required of KYC’d users, and they demonstrate that the user meets whatever requirements posed by the protocol or DApp they’re trying to interact or transact with. Any number of conditions can be verified by a KYC record, Age thresholds, Country restrictions, KYC level restrictions, etc. but once the condition proof and membership proof are submitted (and assuming they’re both valid) the user will be able to utilize the protocol or DApp.
Proof generators and validators can be created from the public Circom code, Circom is a DSL (Domain Specific Language) used to express computations for which zk-SNARKs need to be created. This preserves the decentralization of the entire Galactica Network as everyone and anyone can create their own generators and validators rather than relying on a single authority — such as the Galactica Network — to develop them. The proof generation requires the user’s private inputs, therefore it can be completed on the frontend** — allowing the user’s information to remain private and secure. The decentralized nature of proof generators also guarantees that any entity can deploy a new, custom, verifier depending on some condition that needs to be checked.
**In addition to proof generation being done on the frontend, the development team is currently experimenting with generating the ZKP inside a Metamask plugin. This could potentially be a better solution as, aside from the user base being generally more familiar with Metamask, the user does not need to trust the frontend to handle their private data confidentially, removing another point of compromise from the process.
Performing KYC on-chain is a task that can be very complex to those unfamiliar with nuanced aspects of blockchain technology and cryptography, but we hope that this article may serve as an introduction to this important aspect of Galactica Network. In this article, we detailed how this important function, which contains a plethora of confidential, personal information, is performed on-chain in such a manner as to ensure the security of said information. Oracle Nodes are a vital piece of Galactica Network’s infrastructure and their healthy operation enables the fundamental concepts underlying Galactica Network Citizenship, and the protocol’s ability to successfully model a true Cypher State.
Disclaimer: This document concerns a Galactica Network mechanism that is a work in progress, mentions of various aspects of the mechanism are subject to change and evolution as the Galactica Network development team continues its efforts in the development of the protocol.
Website | Medium | Twitter | Reddit | Telegram Announcement | Deck | Contact Us | Discord