Challenges & Solutions of Onchain FHE: Unlocking the Holy Grail

Arnav Pagidyala
HashKey Capital Insights
18 min readDec 5, 2023

Article is co-written by: Jeffrey Hu and Arnav Pagidyala

Takeaways:

  • FHE promises to be the “Holy Grail of Cryptography,” however there are many performance, developer experience and security concerns that limit its current adoption
  • As shown in the graphic above, FHE will need to be used alongside ZKPs and MPC to build a truly confidential and secure, shared state system
  • FHE is rapidly evolving; The development of new compilers, libraries, hardware etc. Not to mention, FHE immensely benefits from the R&D of Web2 companies (Intel, Google, DARPA etc.)
  • As FHE and its surrounding ecosystem matures, we expect to see “verifiable FHE” become the standard. FHE applications may opt to outsource computation and verification to FHE coprocessors
  • A foundational limitation of onchain FHE is “who holds the decryption key?” Threshold decryption and MPC offer solutions, however they are generally bound by a performance <> security tradeoff

Introduction:

The transparent nature of blockchains is a double edge sword; while there is beauty in its openness and observability, it is a fundamental obstacle for enterprise adoption.

Onchain privacy has been one of the most challenging problems in crypto for nearly a decade. Although there are many teams building ZK based systems to achieve onchain privacy; they cannot support shared, encrypted state. Why? Short answer is because these programs are a function of a series of ZKPs and as such, it’s impossible for anyone to apply arbitrary logic to the current state. What does this mean? Basically we cannot build confidential shared state applications (think private Uniswap) using *just ZKPs.

However, recent breakthroughs have shown that combining ZKPs with Fully Homomorphic Encryption (FHE) could achieve fully generalizable, confidential DeFi. How? FHE is a burgeoning field of cryptography which enables arbitrary computation over encrypted data (sounds crazy right?!) As shown in the graphic above, ZKPs can prove the integrity of user inputs and computation, and FHE can process the computation itself.

While FHE promises to be the “Holy Grail of Cryptography,” we attempt to provide an objective analysis of the field and its various challenges and possible solutions. We cover the following onchain FHE topics:

  • FHE Schemes, Libraries and Compilers
  • Secure Threshold Decryption
  • ZKPs for User Inputs + Computing Party
  • Programmable, Scalable DA Layer
  • FHE Hardware

FHE Schemes, Libraries and Compilers

Challenge: Nascent FHE Schemes, Libraries and Compilers

The foundational bottleneck of onchain FHE lies within its lagging developer tooling/infra. Unlike ZKPs or MPC, FHE has only been around since 2009, and therefore had a much shorter timeline to mature.

The primary limitations in FHE DevEx are :

1. An easy-to-use front-end language for developers to code without much knowledge of the backend cryptography

2. A fully functional FHE compiler that can handle all the dirty work. (Parameter selection, SIMD optimization for BGV/BFV and parallel optimization)

3. Existing FHE schemes (particularly TFHE) are around 1000x slower compared to plain computation

To truly understand the intricacies of integrating FHE, let’s walk through the developer journey:

The first step to integrate FHE into your application is to choose an FHE scheme to use. The following chart explains the primary schemes:

As shown in the table above, boolean circuits like FHEW & TFHE have the fastest bootstrapping and can avoid complicated a rather complex parameter selection procedure, and they could be used in arbitrary/generic computations but are relatively slow; While BGV/BFV could be fit for general DeFi as they are more efficient at high-precision arithmetic computations, but developers have to know the depth of the circuit in advance to set up all the parameters of the scheme. On the other end of the spectrum, CKKS supports homomorphic multiplication where precision errors are allowed, making it optimal for non-precise use cases such as ML.

As a developer, you need to choose an FHE scheme extremely carefully as it will impact all other design decisions and performance down the line.

After selecting an FHE scheme, a developer needs to setup parameters. Picking the right parameters will have a drastic impact on the performance of your FHE scheme. Unfortunately, it requires an understanding of abstract algebra, FHE-specific operations such as relinearization and modulus switching and working with arithmetic or binary circuits. Lastly, to effectively choose parameters, one needs to conceptually understand how they affect the scheme.

Moving on to libraries, the features and capabilities of the popular FHE libraries can be seen from the table below:

But the libraries also have different relationships with schemes and compilers as shown below:

At this point, a developer would ask:

How large does my plaintext space need to be? What magnitude of ciphertexts are acceptable? Where can I parallelize computation? And many more…

In addition, although FHE can support arbitrary computations, developers need to change their mindset when writing FHE programs. For example, one cannot write a branch (if-else) depending on a variable in the program, because programs cannot directly compare the variables like plain data. Instead, developers need to change their code from branches to some kind of computation that could incorporate all branches’ conditions. Similarly, this prevents developers from writing loops in their code.

TL;DR For a developer uninitiated in FHE, it’s nearly impossible to integrate FHE into their application. It will require significant dev tooling/infra to abstract away the complexities presented by FHE.

Solutions:

  1. Web3 Specific FHE Compilers

While there already exist FHE compilers built by the likes of Google and Microsoft, they are:

  • Not designed with performance in mind, adding 1000x overhead vs writing circuits directly
  • Optimized for CKKS (aka ML) and not as beneficial for BFV/BGV needed for DeFi
  • Not built for Web3. Does not support compatibility with ZKP schemes, program chaining etc.

That is until the Sunscreen FHE compiler. It is a Web3 native compiler that offers some of the best performance for arithmetic operations (think DeFi) without hardware accelerators. As discussed above, parameter selection is arguably the most difficult part of implementing an FHE scheme; Sunscreen has not only automated parameter selection, but data encoding, key selection, implements relinearization and modulus switching, sets up the circuits and implements SIMD style operations.

As we move into the future, we expect to see further optimizations not only in the Sunscreen compiler, but other teams building their own implementations that support other high level languages.

2. New FHE Libraries

While researchers keep exploring new powerful schemes, FHE libraries can enable more developers to integrate FHE.

Let’s take FHE smart contracts as an example. Although it can be difficult to choose from different FHE libraries, it becomes easier when we talk about onchain FHE because there are only a few dominate programming languages across Web3.

For example, Zama’s fhEVM integrates Zama’s open source library TFHE-rs into a typical EVM, exposing homomorphic operations as precompiled contracts. Effectively enabling developers to use encrypted data in their contracts, without any changes to compilation tools.

One of the core reasons for lack of overall FHE infra is we do not have an efficient way to construct FHE-RAM. A possible direction to consider is an FHE-EVM without certain opcodes.

3. Programmable Bootstrapping

As mentioned earlier, TFHE is about 1000x slower than computing on plain data. One area of open research to speed up FHE is called programmable bootstrapping (or PBS lol). Let’s first cover standard bootstrapping:

Purpose: Bootstrapping is a technique used to reduce the noise that accumulates during the computation on encrypted data. In FHE, every operation on ciphertext adds a bit of noise. If this noise grows too large, it can make the encrypted data undecipherable when finally decrypted.

How it Works: Bootstrapping involves decrypting and immediately re-encrypting the data, but this is done without ever fully decrypting the data to plaintext. This process effectively “resets” the noise level, enabling further computations on the data.

Challenge: Traditional bootstrapping is computationally intensive and can be a major bottleneck in the practical application of FHE, making it slower and less efficient.

Programmable Bootstrapping:

Enhanced Flexibility: Programmable bootstrapping is an advanced form of bootstrapping that not only reduces the noise but also allows certain transformations on the data as part of the bootstrapping process. This can include operations like modulus switching or more complex functions.

Efficiency: This approach can be more efficient than traditional bootstrapping because it combines noise reduction with useful computation, potentially reducing the overall number of steps needed for a given computation.

Application-Specific: Programmable bootstrapping can be tailored to specific types of computations or applications, making it more versatile and potentially more efficient for certain tasks.

Noise Flooding:

As covered, Bootstrapping is the primary workaround to reduce the accumulating noise when computing over ciphertexts. However, if adverse parties know the noise of a given ciphertext, they may be able to infer what computation was preformed. Hence, we need to use “noise flooding” to effectively mask what computation was preformed on the plain data. Zama has pioneered some incredible work in the area.

Secure Threshold Decryption

Challenge: Un-secure/Centralized Threshold Decryption

Every confidential, shared state system is predicated upon an encryption/decryption key. No single party can be trusted, therefore the decryption key is sharded among network participants.

One of the most challenging aspects of onchain FHE (Threshold FHE) is building a secure and preformant threshold decryption protocol. There are two primary bottlenecks: (1) FHE based computation introduces significant overhead, consequently requiring high performant nodes which inherently reduces the potential size of the validator set (2) As the number of nodes participating in the decryption protocol increases, the latency increases.

At least for now, FHE protocols rely on an honest majority among validators, however as stated above, Threshold FHE implies a smaller validator set and therefore, a higher probability of collusion and malicious behavior.

What if the threshold nodes collude? They would be able to bypass the protocol and basically decrypt anything, from user inputs to any onchain data. In addition, it’s important to note the decryption protocol can happen undetectably in current systems (aka “silent attack”).

Solution: Improved Threshold Decryption or Dynamic MPC

There a few ways to possibly address the shortcomings of threshold decryption. (1) Enable an n/2 threshold which would make it more difficult to collude (2) Run the threshold decryption protocol inside of an HSM (3) Use a TEE-based threshold decryption network controlled by a decentralized chain for auth allowing for dynamic key management.

Rather than leverage threshold decryption, it’s possible to use MPC instead. An example of an MPC protocol that could be used in onchain FHE includes Odsy’s new 2PC-MPC, the first noncollusive and massively decentralized MPC algorithm.

A different approach could be to only use the user’s public key to encrypt the data. The validators then process the homomorphic operations, and the users themselves can decrypt the result if needed. While feasible only for niche use cases, we could avoid the threshold assumption altogether.

To summarize, onchain FHE needs an efficient MPC implementation that: (1) Works even when there are malicious actors (2) Introduces minimal latency (3) Enables permissionless/flexible entry of nodes.

However, its important to note MPC algorithms are generally bound by a security <> latency tradeoff. To elaborate:

  • Secure MPC usually requires more complex cryptographic operations, leading to increased computation time. For instance, more rounds of communication or more computationally intensive cryptographic primitives = increased latency
  • The number of participants in the MPC protocol can also affect both security and latency. Adding more parties can increase security by distributing trust, but it also increases the communication complexity and thus the latency. Some protocols scale better than others with the number of parties, but there is often a trade-off between scalability, security, and speed.
  • Trade-offs in protocol design can also involve other factors like fault tolerance, resistance to different types of adversaries, and the ability to handle different computation models etc.

TL;DR: There are numerous design decisions when considering which MPC algorithm/threshold decryption is suitable for a given FHE usecase. It’s important to note we cannot eliminate the threshold trust assumption in either MPC or Threshold Decryption.

ZKPs for User Inputs + Computing Party

Challenge: Verifiability of User Inputs + Computation:

In web2, when we ask for a computational task to be performed, we completely trust a given company will run the task exactly as they’ve promised behind the scenes. In web3, the model is completely flipped; rather than rely on the reputation of a company and simply trust they will act honestly, we’re now operating in a trustless environment, meaning users shouldn’t have to trust anyone.

While FHE enables the processing of encrypted data, it cannot verify correctness of either user inputs or computation. Without the ability to check for either, FHE is hardly useful in the context of blockchain.

Solution: ZKPs for User Inputs + Computing Party:

While FHE enables anyone to perform arbitrary computation over encrypted data, ZKPs allow one to prove something is true without revealing the underlying information itself. So how do they relate?

There are three key ways FHE and ZKPs fit together:

  1. The user needs to submit a proof that their input ciphertext is well formed. Meaning the ciphertext meets the requirements of the encryption scheme and isn’t just arbitrary strings of data.
  2. The user needs to submit a proof that their input plaintext satisfies a given application’s conditions. Ex. amount_balance > amount_transfer
  3. The validator node (aka computing party) needs to prove they’ve correctly executed the FHE computation. This is referred to as “verifiable FHE” and can be seen as analogous to ZKPs needed for rollups.

To further explore each application of ZKPs:

  1. ZKP of Ciphertext

FHE is built on lattice-based cryptography, meaning the construction of the cryptographic primitive involves lattices, which are PQ (post-quantum) secure. In contrast, popular ZKPs such as SNARKS, STARKS and Bulletproofs do not rely on lattice-based cryptography.

In order to prove a user’s FHE ciphertext is well-formed, we’d need to show it satisfies a certain matrix-vector equation with entries from rings, and that these entries are small. In essence, we need a proof system designed for handling lattice-based relations that is quick to generate and cost efficient to verify onchain (Tradeoff between proof generation time <> proof verification time). Today, using the RISC Zero Bonsai API, it takes 2+ hours to prove TFHE bootstrapping for a 3bit PBS.

2. ZKP of Plaintext Input

In addition to proving a well-formed ciphertext, the user needs to prove their input plaintext satisfies an application’s requirements. Luckily, unlike the prior step, we can utilize general SNARKs to prove validity around user inputs, therefore enabling developers to make use of the existing ZKP schemes, libraries and infra.

However, the challenging part is we need to “connect” the two proof systems. Connect as in we need to ensure the user used the same input for both proof systems; otherwise a malicious user could cheat the protocol. Once again, this is an incredibly difficult cryptographic feat and an open area of early research.

Sunscreen has also laid the groundwork with their ZKP compiler that offers support for Bulletproofs as it most easily connects with SDLP. Future development is being done to “link” the FHE and ZKP compiler.

Mind Network has been pioneering the integration of ZKPs to ensure zero-trust inputs and outputs alongside leveraging FHE for secure computation.

3. ZKP of Correct Computation

FHE run on existing hardware is extremely inefficient and expensive. As we’ve seen the dynamic of the scalability trilemma play out before, networks with higher node resource requirements cannot scale to a sufficient level of decentralization. A possible solution could be a process termed “Verifiable FHE,” a process where the computing party (validator) submits a ZKP to prove honest execution of transactions. As such, only a single party is needed for computation while other nodes and light clients can verify the ZKP.

An early prototype of verifiable FHE can be displayed by a project called SherLOCKED. Essentially the FHE computation is offloaded to Risc ZERO’s Bonsai which processes computation over the encrypted data offchain and returns the result with a ZKP and updates the state onchain.

Outsource FHE Computation + ZKP Generation with FHE Coproccessors

A recent example of leveraging an FHE coprocessor can be seen with Aztec’s onchain auction demo. As we covered earlier, existing FHE chains encrypt the entire state with a threshold key, meaning the system is only as strong as its threshold committee. Conversely, in Aztec, user’s own their own data, and therefore not subject to a threshold security assumption.

Lastly, it’s important to touch on where a developer can build an FHE application in the first place. Devs can build their applications on either an FHE powered L1, an FHE rollup or build on any EVM chain and leverage an FHE coprocessor. Each design comes with it’s own set of tradeoffs, however we are more inclined towards well-designed FHE rollups (pioneered by Fhenix) or FHE coprocessors as they inherit security from Ethereum among other benefits.

Up to recently, achieving fraud proofs on FHE encrypted data was complex, but recently the team at Fhenix.io demonstrated how to achieve fraud proofs using the Arbitrum Nitro stack paired with FHE logic compilation to WebAssembly.

DA Layer for FHE

Challenge: Lack of Customizability and Throughput

While storage has been commoditized in web2, the same doesn’t hold true in Web3. Given FHE maintains a large data blow-up of 10,000x+, we need to figure out where to store large FHE ciphertexts. If every node operator in a given DA layer were to download and store every FHE chain’s data, only institutional operators would be able to keep up with the demand, risking centralization.

Regarding throughput, Celestia tops out at 6.7mb/s, if we want to run any form FHEML, we would need easily n GBs+/sec. As per recent data shared by 1k(x), existing DA layers cannot support FHE due to design decisions that limit throughput (intentionally).

Solution: Horizontal Scaling + Customizability

We need a DA layer that can support horizontal scalability. Existing DA architectures propagate all data to every node in the network which is a major constraint for scalability. Conversely, with horizontal scalability, as more DA nodes enter the system, each node is storing 1/nth of the data, improving performance and cost as more blockspace is made available.

Separately, given the substantial data size associated with FHE, it would make sense to offer developers higher level of customizability around erasure coding thresholds. i.e. How much of the DA system are you comfortable on a guarantee with? Whether stake based weighting or decentralization based weighting.

The architecture of EigenDA serves as the basis of what a DA layer for FHE could look like. It’s properties of horizontal scaling, structural cost reduction, decoupling of DA and consensus etc.. all gives way to a level of scalability that could one day support FHE.

Lastly, a high level idea could be to build optimized storage slots for storing FHE ciphertexts since ciphertexts follow a particular encoding scheme, so having optimized storage slots might help with efficient use of storage and faster retrieval.

FHE Hardware

Challenge: Low Performance FHE Hardware

One of the primary concerns around the adoption of onchain FHE is the significant latency caused by the computational overhead, making it impractical to run FHE on any standard processing hardware. The limitation stems from the massive interaction with the memory, due to the immense data size the processor needs to deal with. Besides the memory interaction, non-trivial overhead is incurred from the complex polynomial computations and time consuming ciphertext maintenance operations.

To further understand the state of FHE accelerators, we need to uncover the design space: Application-specific FHE accelerators vs. generalizable FHE accelerators.

The crux of computation complexity and hardware design for FHE is always related to the number of multiplications needed for a given application, referred to as the “depth of homomorphic operation.” In the application-specific case, the depth is known, so the system parameters and the related computation are fixed. Therefore, hardware design for this application-specific case will be easier and usually can be optimized for better performance than the generalizable accelerator design. In the general case, where FHE is required to support an arbitrary number of multiplications, bootstrapping is needed to remove part of the noise accumulated in the homomorphic operations. Bootstrapping is expensive and requires considerable hardware resources including on-chip memory, memory bandwidth and computation, which means the hardware design will be very different from the app-specific design. Hence, while the work done by major players in the space, including the likes of Intel, Duality, SRI, DARPA and many others are no doubt pushing the limits with their GPU and ASIC based designs, they may not be 1:1 applicable to support blockchain use cases.

Regarding the development cost; Generalizable hardware requires more capital to design and manufacture than the application-specific one, but its flexibility allows the hardware to be used in a broader application scope. On the other hand, with app-specific design, if an application’s needs change and require support for deeper homomorphic computation, then the app-specific hardware would need to be combined with some software techniques, such as MPC, to achieve the same end as generalizable hardware.

It’s important to note, onchain FHE is far more difficult to accelerate than application-specific use cases (ex.FHEML) because it requires the ability to process more general computations versus a more niche set. To illustrate, fhEVM devnet is constrained to single digit TPS at the moment.

Given we need an FHE accelerator tailored to blockchain, another consideration is how transferable is the work from ZKP hardware to FHE hardware?

There are some common modules between ZKP and FHE, such as the number theoretic transform (NTT) and the underlying polynomial operations. However, the bit width of NTT used in these two cases are different. In ZKP, the hardware needs to support a wide range of bit width for NTT, such as 64-bit and 256-bit, while in FHE, the operands for NTT are shorter due to being represented in the residue number system. Secondly, the degree of NTT handled in the ZKP is in general higher than that of the FHE case. Because of these two reasons, it is not trivial to design an NTT module that achieves satisfying performance for both ZKP and FHE. Other than NTT, there are some other computing bottlenecks in FHE, such as automorphism, which are not present in ZKPs. A naive way of transferring ZKP hardware to the FHE setting is to offload the NTT computing task to the ZKP hardware and use the CPU or other hardware to handle the rest of computation in FHE.

To round off the challenges, computing on encrypted data with FHE used to be 100,000 times slower than on plaintext data. However, thanks to the newer encryption schemes and recent developments in FHE hardware, the current performance has dramatically improved to around 100x slower than plaintext data.

Solutions:

  1. Improvement in FHE Hardware

There are many teams actively building FHE accelerators, however, they are not working on FHE accelerators that are specific to generalizable blockchain compute (i.e. TFHE). An example of a blockchain specific hardware accelerator is FPT, a Fixed-Point FPGA accelerator. FPT is the first hardware accelerator to heavily exploit the inherent noise present in FHE calculations by implementing TFHE bootstrapping entirely with approximate fixed-point arithmetic. Another project that could be useful for FHE is BASALISC, an architecture family of hardware accelerators that aims to substantially accelerate FHE computations in the cloud.

As denoted earlier, one of the primary bottlenecks in FHE hardware design is the massive interaction with the memory. To circumvent this barrier, a potential solution is to reduce the interaction with memory as much as possible. The processing-in-memory (PIM) or near memory computation is probably helpful in this scenario. PIM is a promising solution to address the “memory wall” challenges for future computer systems, which allows the memory to serve both computation and memory functions, promising a radical renovation of the relationship between computation and memory. In the past decade, tremendous progress has been made in designing the non-volatile memories for this purpose, such as resistive RAM, spin-transfer torque magnetic RAM and phase change memory. Using this special memory, there has been work showing significant computation improvement in machine learning and lattice-based public-key encryption.

2. Optimized Software/Hardware

In recent developments, software has been acknowledged as a critical component in the hardware acceleration process. For instance, prominent FHE accelerators such as F1 and CraterLake use compilers for a hybrid software/hardware co-design approach.

As the space advances, we can expect fully functional compilers to be co-designed with blockchain-specific FHE compilers. FHE compilers can optimize FHE programs based on the cost model of the respective FHE scheme. These FHE compilers could be integrated with FHE accelerator compilers to improve end-to-end performance by incorporating a cost model from a hardware-level aspect.

3. Networking FHE Accelerators: The Shift from Scale-up to Scale-out

Existing FHE hardware acceleration efforts largely focus on “scale-up,” which means focusing on the improvement of a single accelerator vertically. On the other hand, “scale-out” places focus on connecting multiple FHE accelerators by networking horizontally, which could drastically improve performance, akin to parallel proof generation with ZKPs.

One of the primary difficulties with FHE acceleration is the data inflation problem: Refers to the significant increase in data size that occurs during encryption and computation, posing challenges for efficient data movement between on-chip and off-chip memories.

Data inflation poses a significant challenge to connecting multiple FHE accelerators via networking horizontally. Therefore, FHE hardware and network co-design will be a promising direction of future research. An example could include an adaptive network routing for FHE accelerators: Implementing a routing mechanism that dynamically adjusts data paths between FHE accelerators based on real-time computational load and network traffic. This would ensure optimal data transfer rates and efficient resource utilization.

Closing Thoughts

FHE will fundamentally reimagine the way we secure data across platforms, paving the way for a new era of unprecedented privacy. To build such a system will require significant advancements in not only FHE, but in ZKPs and MPC as well.

As we venture into this new frontier, collaborative efforts among cryptographers, software engineers, and hardware specialists will be imperative. Not to mention lawmakers, politicians etc. as regulatory compliance is the only path to mainstream adoption.

Ultimately, FHE will stand at the forefront of a transformative wave of digital sovereignty, heralding a future where data privacy and security are no longer mutually exclusive but inextricably unified.

Special Thanks

Many thanks to Mason Song (Mind Network), Guy Itzhaki (Fhenix), Leo Fan (Cysic), Kurt Pan, Xiang Xie (PADO), Nitanshu Lokhande (BananaHQ) for their review. We encourage you to check out the incredible work these folks have contributed to the space!

Disclaimer:

--

--