Closing the Privacy Gap of Distributed Ledger Technology: An Introduction to Zero Knowledge Proofs and Secure Multiparty Computation
--
Context:
The hype about cryptocurrencies in the last few years has brought distributed ledger technology (DLT) to great public attention. Recently, proofs of concept and prototypes of IT-solutions based on DLT have emerged in a great variety of sectors. While the initial hype somewhat cooled down, the really interesting applications of DLT are now being advanced in several promising and forward-looking projects. For example, companies, foundations and organizations in the public sector apply DLT to advance cross-organizational workflows and processes. The German Federal Office for Migration and Refugees is working on a related project (Link) with regard to migration, while projects such as TradeLens aim at streamlining international trade. Further applications emerge in the mobility sector and industry settings. In all of these cases, DLT enables a neutral platform for IT-services providing features such as tamper-resistant documentation, payments or administration of documents without a need for a central intermediary. DLT is also an essential ingredient for self-sovereign identities, which foundations like Sovrin are working on.
The basic idea behind DLT is the redundant and thus transparent storage of data along with the execution of computer programs in each node of a distributed peer-to-peer network, forming a “distributed ledger”. The individual components have been known quite some time before Satoshi Nakamoto’s renowned whitepaper (e.g. contributions from Nick Szabo). In conjunction with distributed consensus, i.e., a mechanism which ensures that the data in the distributed ledger is concordant, these concepts offer a powerful technical basis for processes which have not yet been established or digitalized because yet they have required the participation of a trusted intermediary. However, at first sight, DLT comes at the cost of relinquishing private data due to its transparency principle. This is a serious issue for personal data protection, particularly in the light of the EU General Data Protection Regulation (GDPR). It can also be critical with regard to corporate and trade secrets in consortium platforms based on DLT. Many promising solutions to these problems have been proposed. The incapability of a posteriori deleting and changing data in the known distributed ledgers, most notably Blockchains — which is necessary according to the state of the art in order to ensure tamper-resistance and achieved by cryptographic concatenation of units (blocks) — can be tackled by storing associated data off-chain and only putting their hash-values as references and proofs (arguments) of correctness in the ledger. In the private-permissioned setting, frameworks like Hyperledger Fabric enable additional privacy by managing multiple Blockchains (“channels”) in which only the parties which participate in a process take part, or by a so-called “endorsement policy” which requires only a execution and verification of a smart contract by a specific subset of nodes.
The challenge:
However, in many scenarios, this degree of privacy is still not sufficient. Also in public distributed ledgers used outside of corporate settings, more privacy is often required. One might not want to allow everyone to see a complete transaction history of a public address because this can — in combination with other data — allow to associate public keys with individuals. Companies might approve of using more complicated smart contracts requiring the data from many stakeholders, .e.g. for the detection of systemic risks in a supply chain, because they require to give away private information to potential rivals for computation. Moreover, it is not clear how to implement protocols which do not reveal all inputs, such as a blind auction, in a conventional smart contract. Finally, not all attributes of a digital identity should be revealed in every interaction. Even more, as we will explain below, in the best case scenario its identifier should change with every interaction because otherwise the interactions of an individual can be tracked, which in turn requires some kind of “anonymous transfer of credentials”.
Although DLT is a promising concept that is well suited to address the above-mentioned cases, it has no native solution to the described problems. Fortunately, however, cryptographic methods to tackle these issues have been developed since the 1970s. With the rise of interest in DLT and thus alternatives for central solutions — in which all of the named is not an issue because there is trust in the intermediary, which reveals private data only if desired — these methods are now gaining increasing attention. The two most familiar buzzwords associated with these cryptographic methods are “Zero Knowledge Proofs” (ZKPs) and “Secure Multiparty Computation” (SMC).
Secure Multiparty Computation
SMC denotes protocols in which functions of many variables f(x_1, …, x_n) are evaluated confidentially among n participants (nodes) without any node i getting more information than its own input x_i and the final result f(x_1, …, x_n). SMC is a branch of cryptography and in general involves the combination of tricky reformulations and cryptographic methods. The process of secure addition between 3 participants serves as an illustrative example. It is much simpler than typical real-world examples. However, it gives a rough understanding of how such things might even be possible. Imagine that Alice (A), Bob (B) and Charlie (C) all have a private number (a, b, and c, respectively) which they want to add without anyone of them learning the private number of the two other participants. This can be achieved as follows: A picks a large random number r, computes r+a and sends this to B. Note that B will not be able to recover a from a+r. Then B adds b to a+r, passing the result to C. Likewise, C adds c (again without being able to attain any information regarding a and b), passing r+a+b+c to A. Now A, who is the only participant knowing r, can subtract r and pass the result r+a+b+c-r=a+b+c to B and C. Of course, B and C have to be comfortable that A passes the correct result to them (this is called a ‘honest but curious’ scenario). However, by repeating the computation with the roles of A, B and C, honesty can be checked. Enhanced trust can be achieved by adding cryptographic proofs of correctness of computation, which would lead us too far here (a glimpse can be caught in the explanation of zk-SNARKS below).
Zero Knowledge Proofs
ZKPs are a special case of SMC, which is being widely discussed in research and practice at the moment. A ZKP describes a protocol in which a prover (P) wants to convince a verifier (V) that a certain statement is true without revealing any further data. One common example is the following: V asked P to find a solution to a certain set of equations, and P wants to get some money for the solution. If V doubts that P has in fact found a solution, he might require some reason or confirmation for P’s claim. On the other hand, P certainly does not want to tell V her solution without getting paid in advance. A ZKP would allow P to proof that she knows a number that solves P’s equations without revealing the solution or any further information about the solution.
As another example, imagine that P has a digital identity, which she uses to interact on the Internet. Now, she wants to access medical services and apply for a job with this identity. If she uses the same identifier for each interaction, the companies that she applies for a job at might ask the health service provider for data associated with this identifier. Generally speaking, using a single digital identifier for different interaction poses a threat for anonymity. On the other hand, if P uses a new identifier for each of her interactions, it is difficult to use credentials such as testimonies or a driver’s license. If P asks a government authority to confirm her driver’s license for each of her identifiers, this authority could connect all of these — the consequence would be the so-called “transparent citizen”). Using a ZKP, P can prove to the company that she owns another identifier — which does not need to be revealed — and has a driver’s license attested by a government authority to this identifier — or that she has another identity — which is again not revealed — which has been attested a negative drug-test in the last 6 months, without the need to reveal her whole electronic health record.
To understand that ZKP can be regarded as special case of SMC, note that in this case, the function that has to be evaluated is
f(x_1,x_2) = {1, if x_1 is a valid certificate associated with some identifier of P} and {0, else},
not depending on the input of V. In a way, ZKP can also be regarded as SMC in which the function f is constant w.r.t. all but one variable. It is therefore coherent that progress in the field of ZKP is currently faster since it can be regarded an easier problem to construct generic ZKPs than generic SMCs. A special case of ZKP is the already explained bilateral setting consisting of the prover P and the verifier V. It allows for a comparatively “easy” construction of ZKP at least conceptually: it is based on the idea that P answers a set of yes-no-questions, which also in its entirety do not reveal any information about P’s “secret” and which P can always answer correctly if she indeed knows a “secret” with the promised properties. If P does not know such a secret, she can only guess whether the answer for a particular question should be 0 or 1. Assuming that the questions are efficient and independent in the sense that the probability for P giving the correct answer m times in a row by guessing is 0.5^m, this probability approaches 0 as the number m of questions grows. V can then define a threshold (say, 0.001%) and declare that he is convinced of P knowing the secret if the probability of her “guessing” all the questions right is smaller than 0.001%, leading to a minimum number of log(0.00001)/log(0.5)≈17 queries.
Another illustrative example which bases the construction of an interactive ZKP on “ruling out randomness” can be taken from the excellent two-piece primer [1], [2], which in fact gives a beautiful argument why a very large class of claims can be formulated in a way which allows for a ZKP (however, this construction is in general very inefficient, and the difficulty for application of ZKP in practice is to find more efficient protocols).
zk-SNARKS
It is important to note that for the described method of constructing interactive ZKP, a third party (observer) can never be sure that the yes-no-questions from the verifier and the answers of the prover had not been arranged in advance. This means that any party, which did not formulate a sufficient amount of questions on her own, will not trust an interactive ZKP. Moreover, the repeated interaction between the prover P and the verifier V and is often quite computationally heavy. Both is fine if the ZKP takes place in a bilateral setting. However, if one wants to perform a ZKP proving something without interaction or too many verifiers at once, one has to come up with new solutions and ideas. A classical scenario where non-interactive, computationally light ZKPs are important are — to close the loop — distributed ledgers. In order to fulfill these requirements, research is focusing on so-called zk-SNARKS. This abbreviation stands for the following:
- zk: zero-knowledge: as already mentioned, the prover does not reveal any information of his private data, apart from confirming the required properties which she wants to prove
- Succinct: sufficiently short and efficient (computationally light)
- Non-interactive: no interaction between the prover and the verifier(s) is necessary
- Argument of Knowledge: strictly speaking, from a mathematical point of view it is not a proof of knowledge, but rather an argument, which is, however, very convincing as seen from state-of-the-art-cryptography (of course, the same is true for the interactive ZKP described above, since for any value for the number of queries $m$ there will always be a non-vanishing probability that the prover just guessed the right anwer all the time)
Unfortunately, understanding how zk-SNARKs work in detail is a very complicated task and involves various ingredients from cryptography such as elliptic-curves (or other “complicated” fields in which taking the logarithm is regarded a “hard” problem). In fact, a great description of the essential thoughts and ingredients is given in a trilogy by Vitalik Buterin, the inventor of Ethereum, in these articles [3], [4], [5]. Besides the complications mentioned before, a particular case of zk-SNARK is already present in practically every blockchain: signing a transaction with one’s private key can be regarded as a zk-SNARK: it is a proof that one owns the private key to the corresponding public key (address), without ever revealing the private key, however. It is short (succinct) and non-interactive as well, thus satisfying the requirements for a zk-SNARK. What it shows is that although the concept of a non-interactive ZKP seems surprising at first, it is in fact more familiar than one might expect.
We hope that you enjoyed this short introduction and overview regarding DLT (Blockchain), SMC and ZKP. We envision this article to be the start of an enlightening series in which we treat practical examples for DLT-solutions such as in international trade (shipping documents), platooning, EV-charging infrastructure, public transport, and much more. We also want to dive deeper into the related topics in cryptography, such as secret sharing, oblivious transfer, code obfuscation, elliptic curve cryptography, and many more.
Written by Johannes Sedlmeir and Vincent Schlatt.
Fraunhofer’s Blockchain Lab is a multi-disciplinary unit that designs, develops and evaluates blockchain applications. We transfer the latest R&D results in this young field into practical, integrative applications, putting special emphasis on short development cycles.