Everything you need to know about Homomorphic Encryption
The evolution of encryption and decryption of data to make them secure has seen so much growth in the past 100 years. It has come so far that now we can perform mathematical operations on encrypted data itself! Pretty crazy right?
Let’s see a quick example in Python. In the following code, we encrypted the number 8 and multiplied the encrypted number with 2.
import tenseal from ts
context = ts.context(
ts.SCHEME_TYPE.CKKS,
poly_modulus_degree=8192,
coeff_mod_bit_sizes=[60, 40, 40, 60]
)
context.generate_galois_keys()
context.global_scale = 2**40
v1 = [4]
enc_v1 = ts.ckks_vector(context, v1)
print(enc_v1)
# <tenseal.tensors.ckksvector.CKKSVector object at 0x7aae102c3460>
result = enc_v1 * 2
result.decrypt()
# [8.000001077148633]
See! Imagine the amount of things you can do with this. I’ll be discussing about this further in this article. But before we reach there, I want you to explore your imaginations and try to think of as many use cases as I did and maybe even more! (I bet you can’t ;) )
Alright, let’s start the theory class. I promise I won’t make it boring.
Basic terminologies you must know before you read this article: -
- Public & Private Keys: This is a pair of keys that will be used to encrypt and decrypt ciphertext.
- Plaintext: This is simply the message you want to encrypt.
- Ciphertext: This is the message after encryption
Introduction
Homomorphic Encryption is another form of encryption similar to the famous ones like RSA, AES, RC4, etc. It allows computations on top of encrypted data which results in an encrypted output that can only be decrypted by the user who encrypted it in the first place.
It’s not like this concept magically appeared out of nowhere. This idea existed since the late 1970’s and with the contribution of many researchers, scholars, etc. many versions and schemes were invented to make holomorphic encryption more accurate and faster. Some of them are Partial HE, Gentry, DGHV, LTV, BGV, BFV, Fully HE, CKKS, etc. Each of them has a different use case and also works quite differently.
When Partial HE was first introduced, it could only perform one operation at a time. Either multiplication or addition. But only after Dr. Craig Gentry released his Gentry scheme in 2009 it could start performing both multiplication and addition at the same time.
Let’s learn about the 3 most important types of Homomorphic Encryption methods:-
Partial Homomorphic Encryption
As we’ve discussed above, you can perform either addition or multiplication, but an infinite number of times.
Somewhat Homomorphic Encryption
In this, we can do both addition and multiplication simultaneously, but only for a limited number of times. So what happens is, that each time you create an operation on encrypted data, the larger the noise becomes. This “noise” reduces the accuracy of the data when decrypted. We will learn more about what this “noise” is later in this article.
Fully Homomorphic Encryption (FHE)
It’s the better version of Somewhat Homomorphic Encryption. You can perform both addition & multiplication an infinite number of times.
How does it work? (in simple terms)
Let’s refer to the above image for a better understanding. And also, let's take an example of a scenario. A patient who wants to send their vital records to software for it to analyze and return predicted results. Normally, we would send our raw data for the AI Model to analyze and give predicted results. But now with HE, we can perform these predictions on top of encrypted data.
So now, the user will generate a private key which will have to be kept safely. This private key will be used to decrypt the final encrypted results and it’s also used to generate the public key which can be sent to any cloud services which has to perform the computations & operations.
The user will encrypt their medical vital records with the public key and send it to the cloud service. This cloud service will use the public key to perform any kind of operations on it and will return encrypted results to the user. The user will then have to decrypt the results using their private key. Tada! You’ve successfully got your predicted results in a completely secure manner without ever releasing your data to anyone.
Timeline
Next, let’s see a timeline of how we reached where we are today. The progress is always fascinating for a few people. I wish they had taught this instead in history classes (jk).
1. First Generation FHE
In this First Generation, we could either perform addition or multiplication of plaintext, which was simply binary text where a,b ∈ {0,1}.
Here addition was done by performing XOR or enc(a) and enc(b). Multiplication was done by AND’ing enc(a) and enc(b).
Now there was a problem during this phase. Before I explain the problem, let me describe what “noise” means. Every encryption must have randomness which makes it difficult to decrypt. So we have a limit to this randomness. If it’s too random it becomes difficult to decrypt and get the original data. If it’s not so random, cryptanalysts can easily breach the data. So it must be within a threshold.
While adding 2 encrypted data, the noise just simply gets added, which is not a problem. But while multiplying, the noise gets multiplied, it gets doubled! This leads to noise exceeding the threshold value and making it impossible to decrypt, which is not good! This is why Gentry came up with the idea of “Bootstrapping”.
Bootstrapping means, encrypting the encrypted value to adjust the noise level. Suppose you encrypt X which had a noise level a bit below the threshold. And after performing certain operations, it became equal to the threshold. Now you encrypt this encrypted value so that the noise level gets a bit below again, which makes it close to the original noise level.
2. Second Generation FHE
This generation introduced the leveled scheme. With this, the noise level increases linearly and not exponentially. So with this, you don’t have to implement bootstrapping. But if you’re desperate to increase the number of computations, you can do it and it’ll perform even better.
The way this works is, suppose you have to number A and B and you’re trying to add both of them. The noise level of the final encrypted output is the max(level(a), level(b)). And after multiplication, the noise level of the output will be [max(level(a), level(b)) + 1] I hope that makes sense.
3. Third Generation FHE
During this time, GSW (Craig Gentry, Amit Sahai & Brent Waters) conceptually simplified the process of FHE using eigenvectors, however, it was still very slow. Refer to this for more info.
A year later (2014), a GSW-like scheme called FHEW was also introduced which could do one boolean operation in almost less than a second which was day and night compared to what could be done before this. Refer to this for more info.
4. Fourth Generation FHE
This is where we are now and TFHE (Torus FHE) is the one leading it. It’s a ring variant of the GSW Scheme. In simple words, it applies to bootstrap to all operations + it’s very fast (a few tens of milliseconds), supports all gates + MUX, plaintexts are just bits but extendable and it supports any security strength (eg: 128 bits). Refer to this for more details.
CKKS Scheme was also introduced during the same period as TFHE, in 2016. Unlike the other schemes which supported only integers, CKKS allowed the encryption & computation on real numbers too, which supports tasks like financial analysis, statistical calculations, etc., and also uses a new rescaling mechanism. Refer to this for more details.
Open Source Libraries
1. SEAL
Developed by the research team at Microsoft, is one of the most popularly used FHE libraries, which is developed in C++.
2. TenSeal
This is the Python implementation of SEAL. Since Python has a huge audience, this library has been proven to be useful for many including me.
2. HElib
Also, another popular library developed by the team from IBM in C++. It supports CKKS, BGV, and other schemes.
3. Zama
This is my favorite one and is a relatively new open-source company that provides a variety of libraries such as Concrete (FHE Compiler), Concrete-ML (privacy-preserving ML tools), TFHE-rs (Rust API for TFHE), fhEVM (FHE in smart-contracts).
To Conclude…
The most important sector to which this could be applied would be the medical sector. By implementing this, patients could get their desired results without ever revealing their personal information like age, weight, vitals, etc.
Now, with the availability of so many resources and libraries online, what is stopping you from learning more about this? If this seemed interesting, I’d advise you to keep researching more about this and implement them in your code using libraries like TenSeal/Concrete/etc. There’s still more left to this concept and the possibilities are endless.
If you’ve made it till here, thanks for reading, and have a nice day! :)