Computable’s Cryptography Reading List

Bharath Ramsundar
Computable Blog
Published in
8 min readMay 9, 2018

Cryptography is an ancient field of research whose roots stretch back to antiquity. As a result, the cryptographic literature is vast with many streams of intersecting thought. It can be hard for a newcomer to get a toehold in this space, so in this blog post we share a few of the papers we’ve been reading and discussing at Computable labs along with some of our brief notes on them. We hope that this list will be useful to newcomers trying to get their bearings in modern cryptography!

Classical Papers

There are a number of strikingly relevant papers that date back 40+ years. We recommend starting here since the classics often age well.

A method for obtaining digital signatures and public key cryptosystems http://www.dtic.mil/get-tr-doc/pdf?AD=ADA606588

This is the original RSA paper. Introduces public key cryptography and provides an efficient implementation (the RSA algorithm). This is one of those masterwork papers that will be read a century from now. Clear and elegant without jargon and only requires basic mathematics to follow

On data banks and privacy homomorphisms.

https://pdfs.semanticscholar.org/3c87/22737ef9f37b7a1da6ab81b54224a3c64f72.pdf

This is an amazing paper from Rivest et al. 40 years ago. It touches on secure cloud computing, secure hardware enclaves and introduces the first homomorphic computing scheme. (Note this isn’t fully homomorphic; that’s Gentry’s innovation 30 years later)

Fully Homomorphic Encryption

One of the long-standing dreams of cryptography has been to enable computation to be performed on encrypted data. The construction of the first fully homomorphic encryption scheme by Gentry in 2009 was a breakthrough that revolutionized this field; a fully homomorphic scheme is one which is capable in principle of performing an arbitrary computation on encrypted data. That said, there remain many practical difficulties and complexities to using these schemes in the real world, so adoption has been limited. However, library support for homomorphic encryption has been steadily improving, so we think this might change over the coming years.

Evaluating 2-DNF forms on ciphertexts

https://crypto.stanford.edu/~eujin/papers/2dnf/index.html

This is a Dan Boneh paper on evaluating polynomials of degree two homomorphically. Old paper from 2005, pre-fully homomorphic encryption, but pretty readable. I’m a fan of reading papers leading up to a breakthrough to understand how the innovation really differed from its environment.

Can Homomorphic Encryption Be Practical

https://dl.acm.org/citation.cfm?id=2046682

Provides the first implementation of Ring-LWE (learning with errors is a cryptographic hardness assumption used in most modern homomorphic schemes; see the complexity theory section) somewhat homomorphic encryption. This is what tfhe and other HE libraries use. Also implements the “relinearization” trick: in vanilla homomorphic schemes, it looks like ciphertexts grow larger after multiplications. Relinearization re-parameterizes after each multiplication to remove growth factor. Meta note: enabling private compute at scale is a deep and old problem. There are something like 40 years of research that tackle different aspects of the problem.

Homomorphic Encryption from Learning with Errors: Conceptually-Simpler, Asymptotically-Faster, Attribute-Based

https://link.springer.com/content/pdf/10.1007/978-3-642-40041-4_5.pdf

This is a paper by Craig Gentry and others that uses the LWE (learning with errors) model to construct a simple fully homomorphic schemes that avoids the complex relinearization step. Using this assumption, homomorphic encryption becomes a lot simpler conceptually. Basically homomorphic addition/multiplication map closely to matrix addition and multiplication. I’m not yet sure how good library support for this method is yet though. The downside seems to be that the encryption of a ciphertext is a relatively large matrix, so the expansion factor for the encryption is large.

Efficient Fully Homomorphic Encryption from (Standard) LWE

https://ieeexplore.ieee.org/document/6108154/

Demonstrates the construction of a fully homomorphic encryption scheme from the LWE assumption.

Bootstrapping for HELib

https://eprint.iacr.org/2014/873

This is a very nice technical report walking through the implementation of bootstrapping for HELib. Talks through the math of batching homomorphic computations in depth and provides a detailed explanation of bit extraction procedures. Nice numerical experiments too

Algorithms in HELib

https://eprint.iacr.org/2014/106

Talks through the Homomorphic SIMD concept by which vectors of inputs can be homomorphically encrypted. Shows that homomorphic encryption is much further along to being practical than commonly suspected.

Complexity Theory

Much of modern cryptography is intimately linked with computational complexity theory. Cryptography depends on the existence of hard problems that have hidden “trapdoors.” The identification of a new class of suitable hard problems can dramatically improve cryptosystems, so we recommend brushing up on your complexity theoretic basics!

On Lattices, Learning with Errors, Random Linear Codes, and Cryptography

http://people.csail.mit.edu/vinodv/6892-Fall2013/regev.pdf

This is a fundamental CS paper. Introduces the learning with errors model. It posits that a class of linear equations over Z/2 systems is hard, then proves that a rapid algorithm for such linear systems would enable quantum computers to solve a class of lattice problems. It’s widely believed this isn’t feasible, so it follows that the linear equations are a hard problem. Most fully homomorphic schemes now are built on top of LWE systems whose security is backed by such lattice theorems.

Non Interactive Zero Knowledge http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.207.8702&rep=rep1&type=pdf

Zero Knowledge proofs were initially conceived of as interactive protocols. This paper made the first (I think) construction where interaction can be cut out if prover and verifier share a random string up front. The paper is old and a little hard to read. It’s written in complexity theory style and uses a lot of facts about quadratic residues to make its proofs work. A meta note is that a lot of complexity theory papers are often hard to read; algorithms are discussed in an unfamiliar style amenable to theorem proving but not to practical implementations.

Quadratic Span Programs and Succinct NIZKs without PCPs

https://eprint.iacr.org/2012/215.pdf

An extension to the PCP theorem that provides a new characterization of NP as the class of languages that can be specified by “quadratic span programs.” This characterization yields the better succinct proofs for NP statements. These are so-called SNARKS (succinct non-interactive arguments of knowledge). Quadratic span programs and their cousins, quadratic arithmetic programs, are the technology underlying the zk-snarks used by ZCash and similar projects.

Zero Knowledge

Zero knowledge systems seek to enable participants to provably perform computations on private data. Such systems have recently found wide use in enabling private cryptocurrencies such as ZCash. However, it’s worth emphasizing that zero-knowledge is a far deeper concept than a simple anonymity scheme for cryptocurrencies. The root of zero-knowledge systems springs from the PCP theorem from complexity theory, which provided a novel characterization of NP as the class of problems with proofs that can be efficiently checked probabilistically. Derandomizing these proofs and shrinking them to be succinct yields SNARKs (succinct noninteractive arguments of knowledge). Adding structured obfuscation yields ZK-SNARKs.

Statistical Zero Knowledge Protocols to Prove Modular Polynomial Relations.

https://www.researchgate.net/publication/221355462_Statistical_Zero_Knowledge_Protocols_to_Prove_Modular_Polynomial_Relations

A random reference I pulled. Constructs a special purpose zero knowledge proof for polynomial solutions mod n using RSA. Mostly useful to understand how the mechanics of zero knowledge arise from basic number theory. Uses the notion of a bit commitment scheme, but features some interactivity

Pinocchio: Nearly practical verifiable computation.

https://eprint.iacr.org/2013/279.pdf

Seminal paper that constructed the zero knowledge verifiable computation system that is used by ZCash and many other zero knowledge systems. Based on the technology of quadratic arithmetic programs. It’s Interesting to note that SNARKs are very new cryptography. The term originates around 2010/12. The roots go back to the PCP theorem from the 90s which introduced the idea that complex proofs could be verified rapidly by a random set of queries. Follow up work derandomized the query set (idea: use a pseusorandom generator) then made repeated efficiency tweaks to make the no interactive proofs succinct.

Succinct Non-Interactive Zero Knowledge for a von Neumann Architecture

https://www.usenix.org/system/files/conference/usenixsecurity14/sec14-paper-ben-sasson.pdf

This is a beautiful paper that constructs a “universal circuit” for zk-snarks. Given an arbitrary program and inputs, constructs a zero knowledge proof of correctness Lots and lots of ideas here. The first idea is that they execute the program and generate a program trace. The zero knowledge circuit basically checks that each step in the computation proceeds correctly (this is similar to TrueBit’s mechanism). They also do a number of optimizations. It turns out that most zk-snarks depend on a bilinear pairing between suitably defined elliptic curve subgroups. Verifying the zk-snark requires evaluating this bilinear pairing repeatedly. By batching evaluation and precomputing parts of the proofs, achieves major speedups.

Zerocash: Decentralized Anonymous Payments from Bitcoin (extended version)

http://zerocash-project.org/media/pdf/zerocash-extended-20140518.pdf

Zerocash extended paper: ZK-snarks are unfortunately explained in black magic fashion, but the basic mathematical construction is pretty pedestrian when you get into the details. The extended ZCash paper does a great job of fleshing out the details and most valuably constructs the precise circuit for which the zk-snark is constructed.

Scalable, transparent, and post-quantum secure computational integrity

https://eprint.iacr.org/2018/046

Or better known as the zk-stark paper. This is a deeply technical, unpolished paper. It’s only a first Arxiv version though so that’s alright. The central insight is that the PCP theorem proves that it’s always possible to devise protocols with public randomness in place of private randomness. This means it’s possible in principle to get rid of trusted randomness setup in zk-snarks. This paper takes the first step towards this by constructing a IOP, interactive Oracle protocol, for computation verification. Uses Reed Solomon code words for program verification, but don’t quite grok all the details yet.

Multiparty Computation

Multiparty computations seek to enable groups of participants who mutually distrust one another to perform useful computation together.

Tasty: A compiler for secure multiparty computation

https://dl.acm.org/citation.cfm?id=1866358

Provides a high level programming language for small multiparty computations. Mixes garbled circuits, homomorphic encryption and oblivious transfer to construct protocols

Multiparty Computation from Somewhat Homomorphic Encryption

https://eprint.iacr.org/2011/535

The SPDZ protocol. Uses somewhat homomorphic encryption alongside a multiparty compute step. Currently SPDZ protocols are the leading multiparty ML contender with a few implementations out there

Private Machine Learning

Now that machine learning is all the rage, enabling the performance of machine learning on encrypted data has taken on significant importance. However, technologies for enabling efficient learning on encrypted data remain in their infancy.

CryptoDL: Deep Neural Networks over Encrypted Data

https://arxiv.org/abs/1711.05189

Performs inference with a deep network on homomorphically encrypted data. Note however that the model is trained on plaintext data beforehand. Has to make some tweaks to the model architecture to achieve performance. Gets some reasonable accuracy. 91.5% on Cifar10 for example.

Privacy-Preserving Deep Learning via Additively Homomorphic Encryption

https://eprint.iacr.org/2017/715.pdf

This is a really interesting recent paper on how to do secure deep learning. Basically participants do local training, compute gradients, then send homomorphically encrypted gradients to a parameter server which does aggregation. All participants are granted back trained models. Really clever since it means that direct information leakage to trusted server is cutoff, but the homomorphic encryption part is really simple (no homomorphic back prop…) so still efficient.

Encrypted statistical machine learning: new privacy preserving methods

https://arxiv.org/pdf/1508.06845.pdf

Another really interesting paper. This one trains Naive Bayes and random forests on homomorphically encrypted data. Although learning on homomorphically encrypted data is still hard, I suspect it’s going to become a lot easier moving forward.

Trusted Execution Environments

A line of research suggests that the best practical security should be based in hardware. The idea here goes that hardware manufacturers would create “hardware enclaves” (trusted execution environments) where sensitive computations can be executed. The leading contender in this space is Intel’s SGX execution platform. The weakness here of course is that Intel needs to be trusted to have engineered the enclave correctly, and furthermore, that a 3rd party purchaser of an Intel SGX hasn’t succeeded in cracking the security guarantees. These are high bars, and adoption of hardware enclaves has mostly remained limited to academia for the time being, but there are certainly a few interesting projects beginning to take flight.

Ekiden: A Platform for Confidentiality-Preserving, Trustworthy, and Performant Smart Contract Execution

https://arxiv.org/abs/1804.05141

Merges blockchain and trusted execution environments by suggesting that execution of sensitive smart contracts be shifted to the hardware enclave. Has the interesting property that heavy-duty smart contracts, including machine learning contracts can now be executed on the enclave.

--

--

Bharath Ramsundar
Computable Blog

Co-founder and CTO at @ComputableLabs. Prev: Creator of https://DeepChem.io . Author @OReillyMedia. Stanford CS PhD on deep drug discovery