Homomorphic Encryption and Data Sovereignty in the AI Era (feat. CKKS Scheme, NEAR Protocol)

Published in

Blockchain at Yonsei

11 min read2 days ago

Introduction

In the forthcoming AI era, data is often likened to oil due to its immense value. However, the current reality is that companies indiscriminately collect, store, and use personal information. Consequently, data sovereignty and privacy issues are naturally emerging as significant concerns. In this context, homomorphic encryption technology is garnering attention.

Homomorphic encryption technology allows operations to be performed on encrypted data. By using homomorphic encryption, end-to-end encryption can be achieved, maintaining the encrypted state throughout the entire process of transferring, processing, and using the data, thereby ensuring the confidentiality of the data.

This article first discusses the principles of the CKKS scheme, a type of homomorphic encryption scheme, and its application in machine learning. Furthermore, it explores how the combination of homomorphic encryption technology with the NEAR protocol and blockchain technology can secure data sovereignty in the AI era.

Homomorphic Encryption

The term “homomorphic” in homomorphic encryption is originally a concept from algebra, where a homomorphism is a function that preserves the algebraic structure.

By using an encryption function that satisfies the above property, homomorphic encryption supports operations on encrypted data. This can be expressed mathematically as follows:

Enc(pt1) + Enc(pt2) = Enc(pt1 + pt2) (Preserves additive operation)
Enc(pt1) * Enc(pt2) = Enc(pt1 * pt2) (Preserves multiplication operation) (Here, Enc is the encryption function, and pt1 and pt2 are plaintexts.)

Note: SHE and FHE
Homomorphic encryption schemes that support unlimited addition, but limited multiplication are called somewhat homomorphic encryption (SHE) schemes, whereas those that support arbitrary numbers of additions and multiplications are called fully homomorphic encryption (FHE) schemes.

RLWE-based Homomorphic Encryption

Many homomorphic encryption schemes nowadays (such as BGV, BFV, CKKS, TFHE) are based on the LWE problem and its variant, the RLWE problem. The security of these schemes is also based on the hardness of the LWE and RLWE problems. Therefore, understanding the CKKS scheme requires an understanding of how to construct homomorphic encryption schemes based on the RLWE problem.

1. THE LWE and RLWE Problem
The LWE problem was introduced by Regev in 2005 and is described as follows:

Let n and q be positive integers.
Let Zq be the ring of integers modulo q. (For example, if q = 5, then Zq = {0, 1, 2, 3, 4})
Let χ be an error distribution over Zq. Let s ∈ Zq^n be the ‘secret’.

The Search-LWE problem is defined as follows:
(1) Choose a uniformly random vector a from Zq^n and an error term e from the error distribution χ. (In other words, a = (a1, a2, …, an) where each ai is chosen uniformly from Zq.)
(2) Compute b = <a, s> + e (mod q), where <a, s> denotes the dot product between a and s.
(3) The Search-LWE problem is to find s given (a, b) ∈ Zq^n × Zq.

The RLWE problem is a variant of the LWE problem that uses polynomial rings, described as follows:

Let R be the polynomial ring Z[X]/(X^n + 1), where n is a power of 2. Here, Z[X] is the polynomial ring with integer coefficients, and Z[X]/(X^n + 1) is the ring of polynomials in Z[x] modulo (X^n + 1).
Let q be a prime, and let Rq be the ring formed by taking coefficients of R modulo q, i.e., Rq = Zq[X]/(X^n + 1).
Let χ be an error distribution over Rq.
Let s ∈ Rq be the ‘secret’.

The RLWE problem is defined as follows:
(1) Choose a uniformly random polynomial a from Rq and an error polynomial e from the error distribution χ.
(2) Compute b = a*s + e. (‘*’ denotes polynomial multiplication in ring Rq)
(3) The Search-RLWE problem is to find s given (a, b) ∈ Rq × Rq.

The key difference between the LWE and RLWE problems lies in whether a is chosen from integer ring Zq^n or from the polynomial ring Rq. RLWE-based encryption schemes have the following efficiency advantages over LWE-based schemes:

Storing (a, b) ∈ Rq × Rq requires less memory than storing (a, b) ∈ Zq^n × Zq, which gives an advantage in key size.
Polynomial multiplication in Rq can be performed efficiently in O(nlogn) time using the Number Theoretic Transform (NTT).

Note: LWE and RLWE Assumptions
The LWE and RLWE assumptions are about the hardness of solving the described problems above. It has been mathematically proven that LWE and RLWE these problems can be reduced to certain problems on ideal lattices, and no polynomial-time algorithms (including quantum algorithms) have been found to solve them efficiently. Thus, LWE and RLWE-based encryption schemes are also considered post-quantum cryptography (PQC) schemes resistant to quantum attacks.

2. RLWE-based Homomorphic Encryption Scheme
To grasp the concept of RLWE-based homomorphic encryption schemes, let’s design a simple symmetric key homomorphic encryption scheme based on RLWE. Note that practical schemes use asymmetric keys and are much more complex, but the principal remains the same.

Let pt (∈ Rq) be the plaintext and s(∈ Rq) be the secret key.

Let’s define Enc, Dec, Add, and Mul as follows:

Enc(pt) = (b, a) = (-as + pt + e, a) (∈ Rq × Rq)
Dec(ct) = Dec((b,a)) = b + as (∈ Rq)
Add(ct1, ct2) = (b1 + b2, a1 + a2) (∈ Rq × Rq)
Mul(ct1, ct2) = (b1b2, a1b2 + a2b1, a1a2)(∈ Rq × Rq × Rq) with secret key (1, s, s²)

Now let’s examine a few simple propositions based on the above scheme. Doing the calculations yourself with a pencil and paper will give a much better grasp.

Proposition 1) Dec(Enc(pt)) = pt
(sketch of pf)
Dec(Enc(pt)) = Dec((-as + pt + e, a)) = -as + pt + e + as = pt + e

Here, we get pt + e instead of the desired plaintext pt. In BGV scheme, this noise is discarded during decryption given that it is smaller than a certain threshold. If the noise exceeds the threshold, bootstrapping operations are performed to reset the noise.

Proposition 2) Dec(Add(ct1, ct2)) = pt1 + pt2
(sketch of pf)
Dec(Add(ct1, ct2))
= Dec((b1 + a1, b2 + a2))
= pt1 + pt2 + e1+ e2 = pt1 + pt2 + e’ (e’ = e1+ e2)

Proposition 3) Dec(Mul(ct1, ct2)) = pt1 * pt2
(sketch of pf)
Dec(Mul(ct1, ct2))
= Dec((b1b2, a1b2 + a2b1, a1a2))
= b1b2 + (a1b2 + a2b1)* s + a1a2 * s²
= pt1pt2 + pt1e2 + pt2e1 + e1e2
= pt1 * pt2 + e’’ (e’’ = pt1e2 + e1pt2 + e1*e2)

There are two issues with Mul Algorithm: firstly, the resulting error e’’ is much larger than the original error e or the error e’ from Add; secondly, the key size increases exponentially with repeated multiplications. To address the first issue, FHE schemes perform bootstrapping after a certain number of multiplications to reset the noise. To address the second issue, relinearization is used to reduce the key from (1, s, s²) back to (1, s).

Based on the above symmetric key homomorphic encryption scheme, asymmetric key homomorphic encryption schemes can be designed. BGV, BFV, and CKKS schemes all operate on similar principles.

CKKS Scheme

The CKKS scheme, introduced in the 2017 paper “Homomorphic Encryption for Arithmetic of Approximate Numbers” by Cheong, Kim, Kim, and Song, supports real and complex number operations. The CKKS scheme is particularly well-suited for machine learning due to two attributes:
1) It supporst real and complex number operations
2) We can pack multiple data elements into a single ciphertext, enabling parallel calcuations of large data.

Encoding: From Complex Numbers to Polynomials

In the CKKS scheme, message and plaintext are distinguished. The real (or complex) data to be encrypted is called a message, which is first converted into a polynomial plaintext through a process called encoding.

To preserve the operational properties of homomorphic encryption, the transformations between plaintext and message must also preserve additive and multiplicative operations.

Discrete Fourier Transform is a ring isomorphism(which is also a homomorphism) between the polynomial ring R[x]/(X^N + 1) and the complex ring C^(N/2). The CKKS scheme uses this isomorphism, to transform between the complex number space and the polynomial space. Encoding involves the following two steps:

(1) Use the Inverse Discrete Fourier Transform to convert a real or complex vector(in C^(N/2)) to a polynomial with real coefficients(in R[x]/(X^N + 1)).

(2) Multiply the real coefficient polynomial by a scaling factor and approximate it to an integer coefficient polynomial (i.e., in Z[x]/(X^N + 1)). (for example, 12.3451 can be expressed as 12345 * 10^(-3))

The CKKS scheme allows packing N/2 data elements into a single polynomial, enabling large-scale parallel processing.

Note: Error in the CKKS Scheme
The CKKS scheme does not completely eliminate noise during decryption. Thus, small errors occur in the encryption, decryption, and operation processes, which is why it is called as an Approximate Homomorphic Encryption.

HE-Operations In the CKKS scheme

The following basic operations can be performed on encrypted data.

(1) HE-Add
Add N/2 data elements in parallel in a slot-wise manner on encrypted data.

(2) HE-Mul
Multiply N/2 data elements in parallel in a slot-wise manner on encrypted data.

(3) LeftRot(k)
Rotate the slots in the ciphertext to the left by k positions.

The principles of HE-Add and HE-Mul are similar to those in the above ‘simple’ RLWE-based homomorphic encryption schemes. LeftRot(k) is performed by transforming the polynomial p(x) into p(x^{kg}) (i.e., substituting x with x^{kg}).

Application in Machine Learning

Homomorphic encryption can be applied in machine learning in the following ways:

(1) Encrypting user data

Encrypt user data to protect sensitive personal information such as medical data while allowing AI model inference and training. (*For example, research is actively conducted on using homomorphic encryption to infer an individual’s genotype.)

(2) Encrypting AI model weights

Encrypt AI model weights to ensure model ownership while outsourcing computing or providing services.

(3) Encrypting both user data and AI model

Encrypt both user data and model weights to protect user privacy and ensure AI model ownership.

When applying the CKKS scheme to ML, matrix operations, ReLU, and other necessary computations for ML must be implemented using the basic operations(HE-Add, HE-Mul, LeftRot). Optimization methods for these computations are crucial for achieving efficient performance, which is why they are an important research topic.

Exampel 1) Matrix — Vector Multiplication

Operations In AI model inference, when performing Ab (A is an n x n matrix, b is an n x 1 vector) operations, encoding the diagonal vectors of the matrix into individual polynomials allows for faster computation.

Given that A_i is the i-th diagonal vector of A, the following holds:
Ab = A_0*b + A_1 * LeftRot(b, 1) + … A_(n-1) * LeftRot(b, n-1)

Thus, Ab can be computed on encrypted data using O(n) complexity with HE-Add, HE-Mul, and LeftRot operations.

Example 2) ReLU Operation

The CKKS scheme supports addition and multiplication of real (complex) numbers on encrypted data, allowing the implementation of polynomials. Thus non-polynomials like ReLU can be approximated to polynomials using approximation techniques such as minimax algorithms or chebyshev polynomials.

Blockchain x AI Application: NEAR AI NEAR Protocol’s Vision for User-Owned AI

The NEAR protocol aims to revolutionize the current AI industry through blockchain technology. Unlike the current scenario where a few companies dominate AI, NEAR protocol’s goal is to create a fully sovereign operating system that allows users to protect their personal information while leveraging powerful AI capabilities.

Self-Sovereignty Is NEAR: A Vision for Our Ecosystem, by Illia Polosukhin

NEAR Protocol’s vision for user-owned AI includes:

Customized AI assistants: Optimized services tailored to user needs.
Privacy protection: Ensuring AI assistants operate without exposing personal data or assets.
P2P AI interaction: Enabling interaction and transactions with other users’ AIs or community AIs

Potential Role of Homomorphic Encryption Technology

Building an AI stack on a public blockchain while protecting privacy poses significant challenges. The public nature of blockchain means that all data is exposed, and multiple nodes participate in validation and computation. Homomorphic encryption can provide a key technical solution for realizing NEAR Protocol’s vision of user-owned AI by enabling data confidentiality while performing AI computations in a distributed environment.

Specifically, homomorphic encryption can be utilized in NEAR Protocol in the following ways:

Processing encrypted data: Store users’ personal data encrypted on the blockchain and process it using homomorphic encryption without decrypting it, allowing AI model training and inference while protecting privacy.
Establishing AI model ownership: Maintain encrypted AI model weights during inference, ensuring ownership and preventing unauthorized use.
Collaborative AI model training: Training AI models using multiple users’ data while keeping each user’s data encrypted.

The application of homomorphic encryption can play a potentially crucial role in realizing NEAR Protocol’s vision of user-owned AI, enabling privacy protection, secure AI model operation, and safe interaction among ecosystem participants. This can overcome the limitations of the current centralized AI industry and build a new user-centric AI ecosystem.

Conclusion

This article explored the principles of homomorphic encryption, focusing on the CKKS scheme, and discussed its application in AI. While FHE ML currently has limitations in computation speed compared to traditional machine learning, it remains a hot research topic, with organizations like the US Defense Advanced Research Projects Agency (DARPA) investing billions annually in homomorphic encryption research.

Furthermore, the convergence of homomorphic encryption technology with blockchain technologies like the NEAR Protocol is expected to open new futures for users to secure sovereignty over their data and AI utilization.

Reference

Oded Regev. 2009. On lattices, learning with errors, random linear codes, and cryptography. J. ACM 56, 6, Article 34 (September 2009), 40 pages. [1]

Lyubashevsky, V., Peikert, C., and Regev, O. 2013. On ideal lattices and learning with errors over rings. J. ACM 60, 6, Article 43 (November 2013), 35 pages. [2]

Gentry, C., Halevi, S., Smart, N.P. (2012). Homomorphic Evaluation of the AES Circuit. In: Safavi-Naini, R., Canetti, R. (eds) Advances in Cryptology — CRYPTO 2012. CRYPTO 2012. Lecture Notes in Computer Science, vol 7417. Springer, Berlin, Heidelberg. [3]

Cheon, J.H., Kim, A., Kim, M., Song, Y. (2017). Homomorphic Encryption for Arithmetic of Approximate Numbers. In: Takagi, T., Peyrin, T. (eds) Advances in Cryptology — ASIACRYPT 2017. ASIACRYPT 2017. Lecture Notes in Computer Science(), vol 10624. [4]

Halevi, S., Shoup, V. (2014). Algorithms in HElib. In: Garay, J.A., Gennaro, R. (eds) Advances in Cryptology — CRYPTO 2014. CRYPTO 2014. Lecture Notes in Computer Science, vol 8616. Springer, Berlin, Heidelberg. [5]

Cheon, J.H., Kim, D., Kim, D., Lee, H.H., Lee, K. (2019). Numerical Method for Comparison on Homomorphically Encrypted Numbers. In: Galbraith, S., Moriai, S. (eds) Advances in Cryptology — ASIACRYPT 2019. ASIACRYPT 2019. Lecture Notes in Computer Science(), vol 11922. [6]

SNU FHE-School 2024 lecutre notes[7]

https://near.org/blog/self-sovereignty-is-near-a-vision-for-our-ecosystem [8]

https://medium.com/@NearKoreaHub/near-%ED%81%AC%EB%A6%BD%ED%86%A0-x-ai-%EC%84%A0%EB%91%90%EC%97%90-%EC%9C%84%EC%B9%98%ED%95%9C-%EB%8B%88%EC%96%B4-%EB%B9%84%EC%A0%84%EA%B3%BC-%ED%96%89%EB%B3%B4-941200535322 [9]