Introduction to RSA

c0D3M
10 min readSep 27, 2019

--

RSA is a public key asymmetric cryptographic algorithm used to encrypt/decrypt data.Historical details of the algorithm can be found on wikipedia. So let’s start with internal working of the system.

  1. Choose two distinct large prime numbers p & q.
  2. Compute n = p .q , key length. This is same when you hear 2048 bit RSA key.
  3. Compute φ(n) = (p − 1)(q − 1) (will explain in detail later)
  4. Choose a number e such that 1 < e < φ (n) & gcd(e, φ (n)) =1
  5. de ≡ 1 (mod φ(n)) (d is chosen to be modulo multiplicative inverse of e), (will explain in detail later)
  6. public key is (n, e)
  7. private key is (d,n)

Typically e is chosen to be small, like for these days we choose e=65537.e is used for encrypt and verification while d is used for decryption and signing.d is typically of order n.

Encryption c = m mod n

Decryption m= c mod n

Let’s deep dive into mathematics behind RSA.To start with very basic.

Prime Number: Divisible by itself or 1, for example 2,3,5, 17 etc.

Composite Number:Numbers which are made of product of 2 or more primes for example 6 which is product of 2 , 3, in-fact there is Fundamental Theorem of Arithmetic which says all number > 1 can be represented as product of prime and this combination is unique. For example

1200 = 2⁴ * 3 * 5²

Congruence (≡)

a & b are said congruent modulo n , when a and b have same remainder when divided by n.

a ≡ b (mod n)

Let’s see with example 37 ≡ 57 (mod 10) , if you notice 37%10 = 57%10.In-fact 47, 67 are all congruent to mod 10, so all these number belong to one equivalence class of mod 10.Formally we can write as a has an infinite range of possible values of the form a = b + k*n. where k can be any number positive or negative, let’s see with example if k = -7, n =10, b = 57 we get a = -13 and we get same remainder(7) when divide by 10. And that is one advantage of congruence over just equate(=).

Alternate definition :a-b is divisible by n.

Another interesting about modulo is if a congruent to n and a < n , then a-n is also congruent to n. Lets see with example -1 11 mod 12 i.e. -1 and 11 are in same equivalence class of mod 12.

GCD

You must have already read in your school how to compute this using Euclidean algorithm. For sake of completeness, here it is

GCD of two number(26, 11)

GCD(26.11) = 1

Inverse

An inverse of a number is denoted by x⁻¹ is a number when multiplied with x yield identity i.e. 1. For example inverse of 2 is 1/2.

Modulo Multiplicative Inverse: 11.x = 1 mod 26 , solve for x

Extended Euclidean Algorithm: Idea is to represent the GCD in the linear form of the input. For example
1 = 26 * x + 11 * y [where x and y are Integers] Let see with above example

26 = 11*2 + 4 …(1)

11 = 4* 2 +3 …(2)

4 = 3* 1 +1 …(3)

3 = 1*3 + 0

We start with 3rd equation (the one before we get the answer)

1 = 4–(3*1) [rearranging the remainder]

From eq 2 you get 3 = 11–(4*2) and put in above equation 1= 4*3–11.

From eq 1 you get 4= 26–(11*2) and put in above equation 1 = 26*3–11*7

Coefficient of 11 is -7 and as I explained in congruence section that negative number and a-n are congruent , so this is same as 26 -7= 19 i.e. x =19.

You can test that 11. 19 = 1 mod 26

I am explaining this because remember RSA equation d. e = 1 mod φ(n) , here e and n are public value so if someone compute φ(n) [which is known as Integer Factorization problem ore on this later] then he can solve for d using the above method.

Euler Totient Function φ (phi)

Counts number between 1 to n-1 that are co-prime to it. By co prime we mean their Greatest Common Divisor(GCD) is 1.Let’s see with an example.

Example n =9.

GCD (1 , 9)= GCD(2,9)= GCD(4, 9)= GCD(5, 9)=GCD(7, 9)=GCD(8,9)=1 and they are 6 of them i.e. {1, 2, 4, 5, 7, 9} and hence φ(9) =6.

How about some big number? It seems there is formula to calculate directly.Here p all the unique prime factor of the number. For example prime factor of 9 = 3² and if you put in below formula you get 9 . (1- 1/3) = 6.

Euler Totient Formula.

Now how φ of a prime number, since we know that every number between 1 to p-1 will be co-prime to p hence we quickly write φ(p) =p-1.Similarly φ(p.q) = (p-1)(q-1)

Why RSA equation works

Encryption c = m mod n

Decryption m= c mod n

Before explaining why decryption works, let’s see some theorem

Fermat’s Theorem

For a given prime p

aᵖ⁻¹ ≡ 1 mod p , p is prime number

Quiz : suppose we have to calculate 7¹⁰⁰⁰⁰⁰³ mod 11

1000003 = (11–1)*(11–1)*(11–1)*(11–1)*(11–1)*(11–1) + 3

we know a¹⁰ = 1 mod 11, so all the power will become 1 and all we left is 7³ mod 11 which is easy to calculate and answer will be 2

Euler Theorem

Fermat’s theorem is applicable only when p is prime, but Euler given a theorem for composite number and as it can be see it is generalization of Fermat’s theorem.

a^{Φ(n)} ≡1 (mod n); n can be composite

Notice that if n is a prime Euler theorem will be same as Fermat’s Theorem because Φ(p) = p-1

Quiz: 2²⁴⁵ mod 35 ? So lets break 245 in terms of ϕ(35) because we know then those power terms will be 1 as per Euler theorem

ϕ(35) = 4 * 6 [ why because ϕ(35) = ϕ(5*7) = ϕ(5) * ϕ(7) = 4 *6 =24

245 = 24*10 + 5 and hence 2²⁴⁵ mod 35 = 2²⁴ ..10 times * 2⁵ mod 35 = 32

Proof of RSA system

de ≡ 1 (mod φ(n)) as explained in congruence section it can also be written as d.e = k. φ(n) + 1 …..(1)

A message M sent encrypted as follows C = Mᵉ mod n. On receiver side this encrypted text is raised to power of private key d.

Cᵈ mod n can also be written as Mᵉᵈ mod n after replacing value of C. Also if substitute value of d.e from equation 1 we get

RSA decryption using Euler Theorem

Integer Factorization Problem

So RSA equation is unbreakable for small n because in order to solve de ≡ 1 (mod φ(n))One need to find out φ(n) , where n is a public value but in order to find φ(n), one has to factor n into p & q only then one can find φ(n), which is (p-1)*(q-1). And then one can solve for d using Extended Euclidean Algorithm. This is Integer Factorization Problem , given a product of 2 large prime number factorization is pretty difficult task to find its factor. Difficulty of problem is directly proportional how large is n, these days 2048 bit or more key size is considered safe.

Toy Example

I am explaining this a small example but in general it is more complicated than shown here.

•Public: (23, 143) M=5 ->5²³ mod 143 = 125

•Private:(47, 143) 12547 mod 143 = 5

In RSA system d is or order n, so raising a cipher text to a large exponent is computationally expensive. In practise this is achieved using Garner’ Formula. You can read more about the decryption technique here.

There is another way RSA is used i.e. Signature & Verification. In this scenario a party which has access to private key d perform Signature = Mᵈ mod n and send the (M,Sig) , receiver who has access to public key also raises this Sigᵉ mod n and verifies if M == Sigᵉ mod n ? If yes then message is authentic and the sender cannot deny sending this message as only sender can sign this message with his private key and no one has access to it. In practise M is first padded and hashed using SHA-1/SHA-2 and then signed.

Malleable Property of RSA

In mathematics we know 5² * 4² can be written as (5.4)². In RSA as well cipher text C₁ = (M₁)ᵉ mod n and C₂ = (M₂)ᵉ mod n , if we multiply this two cipher text c₁ * c₂ , on receiver size this can be properly decrypted and resultant plaintext would be M₁ * M₂. This is malleable property of RSA and later we will see it is useful as well exploited in some vulnerability.

An example to understand this is suppose Alice doesn't know how to calculate area of a rectangle(w=7, h=3 ), all she has is : private (47, 143) & Public: (23, 143). Alice encrypt input and send to Bob.

C_w= 7²³mod 143 = 2

C_y =3²³ mod 143 = 126

Bob returns return 2 *126 =252 mod 143 = 109

Alice decrypt with private key i.e. 109⁴⁷ mod 143 = 21

Here I have show a very very simple example but in reality a more complex equation can be solved using combination of other cryptographic algorithms which provide other primitive like addition etc.This is useful where we want a cloud service to execute some operation on our data and without revealing data. This is a research area and more can be read here http://homomorphicencryption.org/introduction/

Padding scheme for RSA

In practise encryption is not done just on input data and there are valid reasons for it. For example suppose you have send a data 1 raising this to any public key 1ᵉ mod n will always give us 1 and an attacker can detect. There are other reasons like if e << N. See this

We have to pad this data and make it of same size as N. There are various padding scheme propose PKCS #1 v1.5 , OAEP.

PKCS #1 v1.5 is a popular and we will see how it works and later we see some vulnerability because of this.

PKCS #1 v 1.5 padding for Encryption in TLS <1.2

0002 : Defines what type of operation we are going to do with this data, 0002 means its a public key operation i.e. encryption using the e while 0001 we are doing to perform signature operation i.e. padded data will be raised to power of d.

Random are all padding bytes and it must not contain 00 byte. number_of_random bits = (size_of_RSA_key — 2 byte of marker (green_bytes) — 1 byte of end of padding (purple) — payload_len(green)

For Signature there is a small difference that all padding bytes contains 0xFF.Reason is we want encryption of same plain text to be different every time and hence padding is random in encryption while for signature this isn’t the case.

PKCS #1 v1.5 padding for RSA Signature operation

In RSA Signature, for TLS 1.2 onwards after the end marker i.e. 0x00,their is also a identifier which defines what type of hash is being used in payload.

How RSA used in TLS 1.2

To understand this a bit of TLS protocol background is required. Basically when a client want to connect to a TLS Server it send a message ClientHello which list what all cipher suites* it can support. Server choose the strongest common cipher suite and send its public key (n, e). Clients create a Pre-Master Secret which is a 48 byte random number(2 bytes of client protocol version + 46 random bytes) and pad as per scheme explained above , raise this padded message to public key e and then send this message. Only server which has access to private key d can decrypt this packet. Once the server receive this packet both client and server has common secret to start deriving master secret and then more session keys. Generally this is more involved than what I explained here like verifying Certificate Signature, generating keys using PRF function and this topic needs a seperate explanation on it own. May be I will write an article one for TLS.

Attacks on RSA

This classical paper listed all the attacks. I will discuss about FREAK & Bleichenbacher’s attack in upcoming articles.

How to generates public private key

--

--