Blockchain Perspectives Part II
At Dekrypt we have consistently focused on privacy preserving technology because we see privacy as both a fundamental human right, and necessary for mainstream adoption. Given this level of importance, it is surprising that projects focusing on this domain are, in our opinion, currently underrepresented and often undervalued. The purpose of this article is to provide both a brief overview of privacy preserving technology from the perspective of current applications, and data points reinforcing the necessity for its use and development.
- Privacy is not secrecy. It is the option to selectively reveal the minimum amount of information necessary for the shortest duration required.
- Legislation will force companies like Facebook and Google to drastically change the way data is handled. This push for change will require the application of new technology and business models. This presents a unique opportunity for teams integrating privacy preserving technologies.
- Privacy technology allows for data to be utilized but not revealed. Data is often used to answer a question; it is not necessary to know and retain the data used to derive that answer. As legislation and public outcry continues to limit how data can be used and shared, this approach will become increasingly necessary for existing data driven use cases.
- Pseudonymity is not acceptable to most corporations, and privacy preservation is required for many blockchain use cases to gain mainstream adoption. We have seen this first hand when building blockchain products for enterprise clients. Increased usage and connectivity is the goal, but is a potential double-edged sword if privacy technology is not utilized in tandem. It is shocking how much information companies like Chainalysis can currently derive.
- In addition to the obvious, privacy preserving technology offers solutions to interoperability, transactional scalability (500 TPS on Ethereum already possible), computational scalability (changes the dynamic from each and every network participant completing all computation individually, to all network participants being able to quickly verify a small proof that all computation was done correctly), and allows for anyone with a mobile device to act as a full node. All of this without concessions to the Blockchain Trilemma.
We Have Lost Control of Our Personal Data
The current way that data is handled is not sustainable. Public outcry and legislation will force changes which require new business models utilizing novel technical solutions. Holding as much data as possible, for unspecified amounts of time, with little to no oversight on how that data is stored, is a ticking time bomb before that data is breached. Given enough time and resources, nothing is secure, thus the current status quo of massive honeypots of data is a terrible approach. Breaches in the past have been mostly a liability of public perception, but are increasingly becoming a significant financial liability as well. This joint motivation necessitates a solution which privacy preserving technologies are well positioned to fill.
Some data points on the problem:
1. GDPR was passed in Europe and the effects are already being seen. Google was fined $57M for violations of this new data privacy law. Google’s investment arm recently led a $100M round in Collibra, a company that “lets companies manage their corporate data so they adhere to regulatory standards.” Fines of this size less than a year after the passing of this legislation are beginning to change the way many companies think about data handling. Whereas before it has been seen as an extremely valuable asset, legislation is forcing companies to realize that potential mismanagement of this data can be a very large liability. Surprisingly, there is strong support for similar legislation in the US from both political parties as well as companies like Google and Facebook who realize the serious consequences of mishandling data.
2. The Facebook hack that led to 30 million users’ personal data being compromised has been a large topic of conversation. The fact that Facebook is often used as a way to sign into other applications only increased the concern with this breach and recent statements by Mark Zuckerberg show that he is looking into blockchain technology as a potential solution for logins and data management.
3. The Equifax data breach leaked personal information from nearly 143 million Americans. Considering the U.S. Census Bureau estimates that there are 247,813,910 adults (over 18) living in the United States, this constitutes a breach of over half the adult population of the United States.
4. Tim Berners Lee, the inventor of the Worldwide Web stated almost two years ago that “We’ve lost control of our personal data.”
5. In the last two years alone, 90% of the data in the world was generated. The pace of data creation and subsequent collection is increasing at an exponential pace. With Gartner projecting 20 billion internet-connected devices by 2020, this pace will likely continue to increase.
With legislators, tech giants, and notable individuals all pushing for change, it is clear that this is a vertical not only worth paying attention to, but integral to avoiding potential catastrophic consequences (a notable example being the potential misuse of election data).
To paraphrase the Cypherpunk Manifesto: Privacy is not secrecy. It is about having the option to reveal the minimum amount of information for the shortest duration of time required.
In short: More and more data is being collected every day, and if we do not regain control of the way our personal data is collected and handled, the potential for negative repercussions is significant. Privacy preserving technology offers a solution.
Privacy Preserving Technology: a Massive Field
Privacy preserving technology covers a huge range of approaches. To narrow the scope of this article the emphasis will be on zkSNARKs, with a strong emphasis on blockchain applications. There are many projects utilizing other techniques such as Multiparty Computation (MPC), Homomorphic Encryption (PHE, FHE), and hardware approaches like Trusted Execution Environments/Secure Enclaves to name a few. Use case specific practicality, number of participants, as well as what kind of data/how that data will be used are all considerations when choosing the correct privacy preserving technology. Future articles will provide a more in depth overview of these different approaches as well as companies that are utilizing them in novel ways. Now let’s dive into the world of Zero Knowledge.
The Elephant in the Room: Trusted Setup
Before continuing into the application section of the article, it is important to address one of the largest concerns when using zkSNARKs: the requirement of a trusted setup.
In order for a zkSNARK based project to work, keys for the prover/verifier element must be generated. If these keys are retained and not destroyed, the network cannot be trusted, as the key holders would be able to forge proofs and do things such as mint coins without the rest of the network knowing. In the original parameter generation for Zcash in 2016, there were six participants, each of which generated a shard. If any of these six destroyed their shard, then collusion would not be possible. While it is unlikely that all six participants would collude, it is definitely possible. That is why in 2018 for the Zcash Sapling upgrade, new parameters were generated, and anyone who posted to the mailing list was allowed to participate. 87 individuals participated in the ceremony and if even a single participant destroyed their shard, then collusion would not be possible.
The Sapling parameter generation is a good model for future projects that wish to use zkSNARKs. By allowing anyone who wishes to participate and knowing that only a single honest participant is required to prevent collusion, anyone who participates can be assured that the network can be trusted without the need to rely or trust any other participant.
Another paper was recently published entitled Sonic. Sonic still requires an initial trusted setup, but allows for that same setup to be modified for use in other zkSNARK applications without the need to generate entirely new parameters for each new circuit/use case. In addition to Sonic, there will likely be continued development to improve both memory efficiency and runtime performance as there are currently several groups working toward these goals.
On a longer time horizon, work on zkSTARKs has been published that offers the same privacy assurances and similar features to zkSNARKs without the requirement of a trusted setup.
While the need for a trusted setup is a valid concern, the ability to allow any individual to participate in parameter generation in the short term, and work on zkSTARKS removing the requirement for a trusted setup entirely as a long term solution should assuage the concerns of most.
Significant portions of data are collected and then never utilized due to laws and regulations. As referenced above, these restrictions will likely grow even more stringent over time. Privacy technology offers an interesting solution in which data does not need to be directly accessed in order to interact with it. To use an example from DIZK:
Suppose that a hospital owns sensitive patient data, and a researcher wishes to build a (public) model by running a (public) training algorithm on this sensitive data. The hospital does not want (or legally cannot) release the data; on the other hand, the researcher wants others to be able to check the integrity of the model. One way to resolve this tension is to have the hospital use a zkSNARK to prove that the model is the output obtained when running it on the sensitive data. Two examples of how to run linear regression and covariance matrix calculation on this sensitive data are given in the above paper.
Applying this approach to any sensitive information opens up numerous possibilities for interaction that were not previously possible. Releasing the actual data would be illegal in the hospital example above, and companies in general have numerous reasons for tight access control on a variety of data (usually loss of strategic advantage or potential leaking to competitors), but DIZK offers a solution allowing greater interactivity and potential for collaboration.
If you haven’t heard of ZoE or Zcash on Ethereum, it is definitely worth looking into. This joint effort by both projects allows for Zcash proofs to be verified on the Ethereum network. There are quite a few applications for this, but the one to highlight is the role it could potentially play in interoperability. Using ZoE, submitting a proof that a transaction occurred on the Zcash network can now trigger an ETH smart contract. To give a tangible example, this could allow for trustless interactions to occur such as: once a proof is submitted that a specific amount of ZEC has been transferred to a predefined address, a smart contract would be triggered releasing some amount of ETH or tokens. If other platforms were to adopt this, it would allow for interoperability between any platform incorporating the ability to verify this specification of proofs. There are obvious security considerations such as finality on different platforms etc., but this offers a novel approach for more direct interactivity between chains.
This is an interesting option for platforms that wish to remain fully public and transparent but are concerned with the information that could be leaked for certain use cases. By using this approach, smart contracts can be triggered by a transaction that is fully anonymous. Many companies are limited in how they can utilize public chains due to privacy concerns, but this approach allows for deep interactivity through a means that does not allow clear linkability. Transactions can remain fully transparent and audit-able but the connection between them can be fully obfuscated rather than pseudonymous.
The Blockchain Trilemma
Vitalik coined the term Scalability Trilemma in which he elaborates on the difficulty in achieving more than two of three defined qualities:
- Decentralization (defined as the system being able to run in a scenario where each participant only has access to O(c) resources, i.e. a regular laptop or small VPS)
- Scalability (defined as being able to process O(n) > O(c) transactions)
- Security (defined as being secure against attackers with up to O(n) resources)
Privacy technology offers several exciting scalability solutions without concessions on any of the above three qualities.
The next two use cases (concise storage/verification and increased transaction throughput) are enabled by privacy preserving technology, but do not focus on its privacy preserving qualities. Instead, they use SNARKs for their succinctness and ease of validation.
Significant focus has been spent on scalability solutions with seemingly less thought directed toward the consequences. As more and more transactions are processed, who will store all of this data? Many individuals believe that only a few large-scale entities will need to store the entirety of a blockchain (concession on decentralization) or rely on solutions such as sharding which distribute data so network participants store pieces rather than the entirety (potential concessions on security and scalability).
An elegant alternate approach has been proposed by Coda (Disclosure: Coda is a Dekrypt Capital portfolio company) in which recursive SNARKS are used to compress a blockchain to the size of two Tweets: the end product being a blockchain that remains several kilobytes forever regardless of blocks added or data contained, allowing even mobile users to store and verify every block. The ability for any mobile device to act as a fully validating node is hard to understate.
While Coda offers a solution to full node scalability concerns, another approach uses recursive SNARKs for transactional scalability. The technique, referred to as rollup by the Ethereum community, allows for upwards of 500 TPS in Ethereum, and a working testnet has already been released by Matter Labs. The high level summary is that transactions can be batched together into a succinct proof, and that proof can then be validated by the Ethereum network rather than the network validating each individual transaction.
A bit of context for the 500 TPS figure: this technique has to balance between scalability and usability. Using 1000 transactions as an example batch amount, this technique would wait for 1000 transactions to be made, then begin work on a proof including all 1000 transactions which is computationally intensive. If transaction number per batch is increased, one can expect an increase in both the time necessary for the batch to fill up, as well as the time necessary to create a proof of these transactions. DIZK increased the feasibility of this approach by introducing the ability to parallelize proof generation, whereas before proofs were limited to single machine generation. A continued increase in both number of transactions (to fill larger batches more efficiently) and efficiency in proof generation would lead to higher potential TPS without concessions to the above trilemma.
ZEXE: Enabling Decentralized Private Computation
In the last two examples, zkSNARKs offer a solution to exponentially growing blockchains, and transaction scalability. The next logical step is a solution for computational scalability. In a paper entitled ZEXE, “a ledger-based system that enables users to execute offline computation and subsequently produce publicly-verifiable transactions that attest to the correctness of these offline executions” is proposed. The ability to have computation done offline, and for a network to quickly verify a proof that the computation was done correctly is a significant breakthrough. Ethereum currently requires all computation to be executed by every network participant — an obvious scalability bottleneck. Many projects have attempted to create ways in which computation can be delegated to anonymous network participants, but all have struggled with how to verify that the computation was done correctly short of redoing the delegated computation in its entirety.
To provide more tangible examples, the paper elaborates methods for creating tokens, smart contracts, and even a decentralized exchange all in a fully anonymous and private way. Whereas the examples above utilized privacy preserving technology for specific applications, one can think of ZEXE as more of a generalized operating system allowing for numerous use cases, bounded only by the imagination of those developing on it. Expect to see quite a few exciting developments leveraging this work.