The Mathematics of Cryptography with Examples
The math behind Diffie-Hellman, ElGamal, RSA, digital signatures
Encoding and Symmetric Encryption
An encoding scheme is a conversion of text to numbers. For example, ASCII is an encoding scheme where A = 65, B = 66, C = 67, etc. With an encoding scheme, every text message has a unique corresponding number, where mathematical functions can be applied.
A symmetric encryption consists of
- an encryption function that takes a secret key k and a message m, and returns c, a ciphered message.
- A decryption function that is the inverse of the encryption function for a fixed secret key k.
Note that because the encryption function and decryption function use the same secret key, being able to encrypt messages implies being able to decrypt ciphered messages.
An example of a symmetric encryption is the Caesar Cipher. Encryption that is not symmetric is called asymmetric.
P vs NP and Computational Complexity
P is the set of problems that can be computationally solved easily. NP is set of problems that can be computationally verified easily. Here, “easily” is defined as polynomial time or better.
P vs NP asks: “Can every problem whose solution can be verified quickly with a computer, can also be solved quickly with a computer?”. P = NP means the answer is yes. P ≠ NP means the answer is no. If you can provide a proof to answer P vs NP, you get a million dollars. Even though no proof exists, many mathematicians believe P ≠ NP.
A one-way function is an invertible function that is easy to compute, but whose inverse is difficult to compute. Note that the existence of one-way functions would imply P ≠ NP.
A trapdoor is a piece of information that allows a one-way function’s inverse to be easily computed.
Public Key Encryption System
A Public Key Encryption System (PKES) is an asymmetric encryption, defined as:
- A public key and a private key such that the public key can be computed from the private key by applying a key-creation algorithm. Note this key-creation algorithm must be a one-way function. As the names imply, the public key is available to the public while the private key is privileged information.
- For every pair of public key and private key, there is an encryption algorithm that is a one-way function. There is a corresponding inverse decryption algorithm where the private key is the trapdoor.
Note that unlike symmetric encryption, encrypting a message does not imply the ability to also decrypt it.
Discrete Logarithm Problem
Let g and h be non-zero integers and p be a prime. The discrete logarithm problem asks for the value of x such that g^x = h (mod p). Note that when p and g are sufficiently large, the discrete logarithm problem is NP but not P, ie. it’s easy to verify, but difficult to solve.
An example of the discrete logarithm problem would be asking for the value of x such that 5^x = 19 (mod 23). One possible answer is 15.
Diffie–Hellman key exchange (DHKE)
Given public numbers p and g, where p is prime and g is an integer, and private numbers a and b, both integers. The Diffie-Hellman key exchange is to compute g^(ab) given the values of g^a (mod p) and g^b (mod p).
Example of using DHKE to generate a shared secret over an insecure channel
- Steve and Anthony want to generate a secret number only they know. Beforehand, they agreed on the numbers p = 941 and g = 627, which they do not need to keep private.
- Steve picks a private key a = 347, which he needs to keep private. He computes 627 ^ 347 (mod 941) = 390 and sends 390 to Anthony over an insecure channel.
- Anthony picks a private key b = 781, which he needs to keep private. He computes 627 ^ 781 (mod 941) = 690 and sends 690 to Steve over an insecure channel.
- Together they can compute 627 ^ (347 * 781) (mod 941) = 470, their shared secret.
If hackers knew only the public numbers p = 941, g = 627, 390, and 690 then solving for Steve and Anthony’s shared secret of 470 is equivalent to solving for the discrete logarithm problem, a P but not NP problem when p and g are sufficiently large.
It is important to note that the DHKE generates a random shared secret. In order to share a given secret, we turn to the ElGamal Public Key Encryption (EPKE).
ElGamal Public Key Encryption (EPKE)
Given public numbers large prime p and integer g. Let a be a private key, with corresponding public key A = g^a (mod p). Let k be a random ephemeral key to be private and meant to be used only once. Let m be the message to encrypt, encoded as a number. Note m must be less than p . Encryption involves computing c1 and c1, where c1 = g^k (mod p) and c2 = m(A^k) (mod p). Decryption involves computing the multiplicative inverse of c1^a in mod p, and multiplying this inverse to c2.
Example of using EPKE to share a secret over insecure channels
- Steve wants to send Anthony, his bank routing number 123 over an insecure channel. Beforehand, they both agree to use the numbers
p = 467 and g = 2, which they do not need to keep private.
- Anthony selects his private key a = 153, which he needs to keep private. He then computes the corresponding public key A = g^a (mod p) = 2¹⁵³ (mod 467) = 224. Anthony makes this public key available for all to see.
- Steve selects a random ephemeral key k = 197 to be used only once. Using this ephemeral key and Anthony’s public key, he computes
c1 = g^k (mod p) = 2¹⁹⁷(mod 467) = 87 and c2 = m(A^k) (mod p) = 123(224¹⁹⁷) (mod 467)= 309. These two numbers c1 = 87 and
c2 = 309, together comprised the ciphered bank routing number, which Steve sends to Anthony over an insecure channel.
- Anthony, using his secret key a, computes c1^a = 87¹⁵³ (mod 467) = 367. He then computes the multiplicative inverse of 367 in mod 467, which is 14.
- Anthony multiplies 14 to c2 mod 467, which is 14 (309) (mod 467) = 123, Steve’s bank routing number.
Suppose hackers intercepted c1 and c2. In order to decrypt the message, they need the private key a to compute the multiplicative inverse of c1^a (mod p). They can try to compute private key a from public key A, but that’s equivalent to solving for the discrete logarithm problem, a P but not NP problem when p and g are sufficiently large.
RSA Public Key Encryption (RSA)
Let p and q be secret keys, both large primes. Let N = pq and integer e with the property that gcd(e, (p-1)(q-1)) = 1. Both N and e are made public. Let m be the message to encrypt, such that m < N. Encryption involves computing c = m^e (mod N) where c is the ciphered message. Decryption involves computing d, the multiplicative inverse of e in mod((p-1)(q-1)) and computing m = c^d (mod N).
The mathematical proof for why the above algorithm works is probably outside the scope of these notes, but know that it relies on Fermat’s Little Theorem.
Example of using RSA to share a secret over insecure channels
- Steve wants to send Anthony his bank routing number 123 over insecure channels.
- Anthony selects two secret keys, both prime, p = 1223 and q = 1987. He then computes two public keys N = pq = 2430101 and e = 948047. Note e satisfies the requirement that gcd(948047, (1223–1)(1987–1)) = 1. Anthony makes N and e publicly available.
- Steve takes Anthony’s public keys, N and e, to encrypt his bank routing number: c = m^e(mod N) = 123⁹⁴⁸⁰⁴⁷(mod 2430101) = 1140689 and sends this to Anthony over an insecure channel.
- Anthony knows the private key e, so he can compute its multiplicative inverse in mod (p-1)(q-1) which is d = 1051235. He takes d and computes c^d (mod N) = 1140689¹⁰⁵¹²³⁵(mod 2430101) = 123, Steve’s bank routing number.
If a hacker gained access to c, she needs to know p and q in order to calculate the multiplicative inverse d, in order to decrypt the message. She can brute-force p and q by factoring N, but factoring large primes is an NP but not P problem.
ElGamal vs RSA
Given that both ElGamal and RSA are public key encryption systems, a natural question to ask is which one to use. Here are some points:
- ElGamal relies on the difficulty in solving the discrete log problem. RSA relies on the difficulty of factoring large primes. As such, RSA encryption is faster than ElGamal but RSA decryption is slower than ElGamal.
- RSA was patented while ElGamal was not. However, the patent expired in 2000, so that isn’t an issue anymore.
A digital signature system is a computational method to verify that a party approves of a digital document. In this sense, it is analogous to pen-and-paper signatures. Similar to public key encryption systems (PKES), a digital signature system involves private and public keys. Formally, a digital signature system is defined as:
- A private key known only the party that owns it and a corresponding public key that can be made public.
- A signing algorithm which takes as input, the digital document and private key and returns a digital signature.
- A verification algorithm which takes as input the digital document, the public key and the digital signature, and returns a true or false on whether or not the signature is authentic.
- A required property is that knowledge of the public key does not reveal the private key.
RSA Digital Signature
When RSA was created, it also included a digital signature scheme. It is defined as:
Let p and q be secret keys, both large primes. Let N = pq which can be made public. Let s and v be the signing key and validation key, respectively. They must satisfy the equation that sv=1(mod (p-1)(q-1)). In other words s and v are multiplicative inverses modulo (p-1)(q-1). Note s must be kept private, while v is public.
Let D be the hash of the digital document encoded as an integer, such that D < N. The digital signature S is computed as S = D^s (mod N). The public can verify the authenticity of this signature by computing S^v (mod N) and verifying it is equal to D(mod N).
Example of using the RSA digital signature scheme
- Steve picks primes p = 1223 and q = 1987 as secret keys. He then computes N = pq = 2430101 and makes N public. He picks s = 1051235 and v = 948047 as his signing and validation keys, respectively. Note that sv = 1051235*948047 =1(mod (1223–1)(1987–1)) as required for signing and validation keys. He keeps s secret and makes v public.
- Steve wants to sign the digital document “YO”. The document encoded ASCII is D = 8979. He computes the corresponding digital signature S as S = D¹⁰⁵¹²³⁵(mod 2430101) = 11788.
- The public wants to verify that Steve signed the document “YO”. They can compute S^v (mod N)=11788 ⁹⁴⁸⁰⁴⁷(mod 2430101) = 8979, the original document D.
Hackers wishing to forge Steve’s signature, they must factor N into two primes, to solve for his signing key s, a computationally difficult problem when p and q are sufficiently large.
The astute reader may observe that if the digital document is huge, possibly millions of digits long, computing the digital signature from the document directly via S = D^s (mod N) is computationally expensive. Furthermore, in the extreme case where a digital document may even be larger than the product of the two largest prime numbers we know, we can’t even compute the signature.
To get around this, we actually don’t apply the signing algorithm to the digital document, but instead we apply the signing algorithm a hash of the digital document.
The hash of a digital document is the output of applying a hash function on that document. A hash function takes as input an arbitrarily long document D and returns a short bit string H, ie. Hash(D) = H, where:
- Computation of the hash function on D should run on linear time or faster.
- Inversion of the hash function should be hard. More specifically, given H, it should be computationally hard to find D such that H = Hash(D). Here “computationally hard” means exponential time or slower.
- The hash function should be collision resistant. More specifically, it should be hard to find two documents D1 and D2 such that
Hash(D1) = Hash(D2).
Example of Hashing a Digital Document
Here’s an example of computing a variant of the SHA (Secure Hash Algorithms) hashing algorithm:
Suppose our digital document says
Teemo is god. All Hail Teemo. All Teemo haterz are losers.
We first convert this to ASCII binary:
We divide these into 14 chunks of 32 bits (and we pad with zeroes if needed)
We then apply the
XOR operation to all these bit-strings sequentially to get:
00000110011001010100110001111111 as binary, or
06654c7f as hexadecimal, which is the common base for hashes. We say that
06654c7f is the hash of our digital document.
Note with this hashing function, it did not matter that our digital document was 224 bits long or a billion bits long. The hash of it will still be 32 bits, which will make generating the digital signature much easier.