# Here’s How Quadratic Sieve Factorization Works

--

Finding the product of two or more numbers is a straightforward mathematical operation, but the inverse operation of decomposing a number into its factors is not a walk in the park. This difficulty in factorization is the basis of the RSA cryptosystem.

Factorization has always been of great interest, and even more so with the rise of modern cryptography.

Given a number **N**, a positive rational integer, what are the computational operations we can carry out on N to obtain its factors? The basic, nay crude approach is trial division; divide N by lesser numbers(from 2 up to the square root of N) and see which ones leave no remainders.

Another elementary approach is Fermat’s factorization method, which is based on the *difference of two squares* identity : **x² - y² = (x-y)(x+ y)**.

To factorize **N** using this method, we have to look for two whole numbers **x** and **y** such that **x²-y² = N** or, what is the same thing, x**²-N** **= y²**. Then the factors of **N** will be (**x-y**) and (**x+y**) according to the stated identity. Algorithmically, we choose an initial value for **x** and increase it in steps of 1, computing x**²-N **in each step, and stopping when it(x**²-N**) is a perfect square(**y²**). The starting value of **x** would be the least positive integer that makes x**²-N > 0**, that is,** x² > N**. So, the starting value of **x** would be the least positive integer greater than the square root of **N**, assuming N is not a perfect square. Note that just two factors which may not all be prime are produced by this method, if we are interested in prime factors we may need to test the output for primality or factorize them again, except we know beforehand that **N** is made up of just two prime factors like in the case of RSA moduli.

The Quadratic Sieve factorization algorithm and the more advanced Number Field Sieve algorithm are based on the basic idea of Fermat factorization method, so, we take an example to see how it works. **Example** : Factorize **N** = 125,513.

The square root of 125,513 is 354.2781…

We take our starting value to be **x** = 355.

Now, compute x**² - N **successively, and check if it is a perfect square:

355² - 125,513 = 126,025 - 125,513 =512 (not a perfect square).

356**² **- 125,513 = 126,737 - 125,513 = 1,223 (not a perfect square).

357**² **- 125,513 = 127,449 - 125,513 = 1,936 = 44**² **(a perfect square).

We got a perfect square at the third step; 357**² **-** **125,513 =44**²**, so

125,513 = 357² - 44² = (357 – 44)(357 + 44) =** 127 * 181**.

The example was purposely chosen to take just a few steps, in contrast, factorizing 80,723 this way will take 214 steps to get to a perfect square.

## Towards Quadratic Sieve Factorization

The two factorization methods described in the introduction are useful for factorizing small numbers, say, with less than twenty digits. For bigger numbers such as the ones used in cryptography, there is need for more efficient algorithms.

The quadratic sieve algorithm has at its foundation, a modification of Fermat’s method suggested by the Belgian mathematician, Maurice Kraitchik. Here, we seek for integers **x** and **y** that satisfy the congruence **x**² ≡ **y**²(mod N), or what is the same thing, the expression **x**² - **y**² = **kN**, where **k** is some positive integer(not equal to 1, otherwise, it becomes Fermat’s method). This is potentially a quicker method of achieving our goal of factoring **N**, if approached the way we are going to describe. The theory behind it is: if **x**² - **y**² = **kN**, then the greatest common divisor of **x - y** and **N**, **gcd**(**x - y**, **N**), is a proper factor of **N**, so is **gcd**(**x + y**, **N**), provided **N** divides neither of **x - y **and **x + y**.

Accordingly, to obtain the factors of **N**, search for integers **x** and **y** that satisfy the congruence **x**² ≡ **y**²(mod **N**). Then compute **gcd**(**x - y**, **N**) and **gcd**(**x + y**, **N**). The **gcd** can be computed efficiently with Euclidean algorithm. Let’s take a toy example to see how it works.

**Example**: factorize 227,179.

We want to search for integers **x** and **y** that satisfy **x**² ≡ **y**² (mod 227179), or what is the same thing, **x**² mod 227179 =**y**². Like in the case of Fermat’s method, we start with **x** equals the ceiling of the square root of 227179, and compute **x² mod** **227179** successively to obtain a series of *relations*.**x² mod 227179** has the same value as **x² - 227179** in the range(*sieving interval*) we are going to work in, so we compute that instead.

The square root of 227,179 is 476.6329…, we start with 477.

477² - 227179** **= **350** (not a perfect square) = 2 × 5² × 7

478² - 227179** **= **1305** (not a perfect square) = 3² × 5 × 29

479² - 227179** **= **2262** (not a perfect square) = 2 × 3 × 13 × 29

480² - 227179** **= **3221** (not a perfect square) = 3221

481² - 227179** **= **4182** (not a perfect square) = 2 × 3 × 17 × 41

482² - 227179** **= **5145** (not a perfect square) = 3 × 5 × 7³

483² - 227179** **= **6110** (not a perfect square) = 2 × 5 × 13 × 47

484² - 227179** **= **7077** (not a perfect square) = 3 × 7 × 337

485² - 227179** **= **8046** (not a perfect square) = 2 × 3³ × 149

486² - 227179** **= **9017** (not a perfect square) = 71 × 127

487² - 227179** **= **9990** (not a perfect square) = 2 × 3³ × 5 × 37

488² - 227179** **= **10965** (not a perfect square) =3 × 5 × 17 × 43

489² - 227179** **= **11942** (not a perfect square) = 2 × 7 × 853

490² - 227179** **= **12921** (not a perfect square) = 3 × 59 × 73

491² - 227179** **= **13902** (not a perfect square) = 2 × 3 × 7 × 331

492² - 227179** **= **14885** (not a perfect square) = 5 × 13 × 229

493² - 227179** **= **15870** (not a perfect square) = 2 × 3 × 5 × 23²

494² - 227179** **= **16857** (not a perfect square) = 3² × 1873

495² - 227179** **= **17846** (not a perfect square) = 2 × 8923

496² - 227179** **= **18837 **(not a perfect square) = 3² × 7 × 13 × 23

After twenty steps, we still haven’t got a perfect square. Well, the idea of the algorithm is really not to get a perfect square this way, the idea is to get a perfect square by multiplying some of the above relations together. By inspection, we see that we can multiply relations 1, 6, and 17 to get a perfect square:

477² ≡ 2 × 5² × 7 (mod 227179)

482² ≡ 3 × 5 × 7³ (mod 227179)

493² ≡ 2 × 3 × 5 × 23² (mod 227179)

We multiply terms on corresponding sides of the congruences to get the relation we want;

477² × 482² × 493² ≡ (2 × 5² × 7) × (3 × 5 × 7³) × (2 × 3 × 5 × 23²)

(477× 482 × 493)² ≡ (2² × 3² × 5⁴ × 7⁴ × 23²) (mod 227179)

(113,347,602)² ≡ (2 × 3 × 5² × 7² × 23)² (mod 227179)**(113,347,602)² ≡ (169,050)² (mod 227179)***113,347,602 is greater than 227179, we have to reduce it mod 227179 to have:***(212,460)² ≡ (169,050)² (mod 227179)**.

So, one factor of 227179 is **gcd**(212460 - 169050, 227179)

= **gcd**(43410, 227179) = **1447**.We obtain the other factor by computing

**gcd**(212460 +169050, 227179) or dividing 227179 by 1447.

Checking manually to see which relations multiply to give a perfect square is obviously impracticable when we have a large set of relations, and, there is a need to *algorithmize* the process so a computer program can handle it; a computer does not have a mind of its own. When inspecting, we looked for a set of relations where the exponents of each prime factor on the right hand side add up to an even number, we can *algorithmize *this “inspection” with Linear Algebra; first, we set up a matrix of the exponents of the prime factors on the right hand side. The biggest prime factor in the set is 8923. A matrix of exponents of primes from 2(the least in the set) up to 8923 is going to be way too big for the small number we are trying to factorize, so we choose a (smaller) bound for the primes we want to work with, called the *smoothness bound*, B. If we choose B to be 50 for example, we can ignore those relations with prime factors greater than 50 and work only with those with prime factors up to 50 called the *smooth relations, *or specifically,* 50-smooth *relations. Relations 1, 2, 3, 5, 6, 7, 11, 12, 17, and 20 above qualify;

2 × 5² × 7

3²× 5 × 29

2 × 3 × 13 × 29

2 × 3 × 17 × 41

3 × 5 × 7³

2 × 5 × 13 × 47

2 × 3³ × 5 × 37

3 × 5 × 17 × 43

2 × 3 × 5 × 23²

3² × 7 × 13 × 23

We form a matrix with the exponents of the prime factors above. There are fifteen primes up to 50(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47), so our matrix will have fifteen columns. 0 is placed to represent the exponent of any missing prime P between 2 and 47(makes sense, since 1 = P⁰). The matrix is as follows:

The matrix can be simplified to make computation faster by recording just the parities of the entries; 0 for even numbers, 1 for odd numbers:

Now, for this matrix, we are interested in the rows that will add up to an even parity i.e. [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]. These can be found by computing the *left null space* of the matrix using standard linear algebraic procedures - which can be coded in a computer program. If **M** represents the matrix above and **S**, it’s left null space, then, we have:

**S** × **M** = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

Solving the matrix equation(by Gaussian Elimination and working with only the parities of the entries) gives **S** = [1 0 0 0 1 0 0 0 1 0 ]. The 1st, 5th, and 9th entries of **S **have the value 1, this indicates rows 1, 5, and 9 in our exponents matrix **M**(corresponding to relations 1, 6, 17) are the rows that will add up to an even parity - and the corresponding relations will combine to give a perfect square.

**Important Note** : We are just lucky to get a factorization with just ten smooth relations(ten matrix rows), normally, we would need more rows than columns(more smooth relations than prime factors) to guarantee linear dependence of the matrix rows and for the left null space to be nontrivial — when writing an implementation, extend the sieving interval so as to have more smooth relations than prime factors, you don’t want to leave things to chance in an algorithm.

## Quadratic Sieving

We’ve not done any sieving yet. This is the heart of the algorithm, an idea introduced by Carl Pomerance to make the procedures we’ve discussed so far more efficient, but before we go into this, let’s introduce the notion of *factor base*. Recall that at the beginning, we computed **x² - 227179 **for various values of **x** to obtain a sequence of numbers;

350, 1305, 2262, 3221,…,17846, 18837.

We then factorized each of these numbers and used the exponents of the *prime factors* to form a matrix. It is possible to determine these *prime factors *in advance* *without factorizing numbers in the sequence using the fact that **227179 **or whatever number to be factored is a *quadratic residue *mod each of these prime numbers. A number *N* is a quadratic residue mod *p* if it is congruent to a square, i.e., there exists an integer ** T** such that

**≡**

*T*²*N*(mod

*p*).

Now, to get the prime numbers for which

*N*(227179, in this case) is a quadratic residue, we use Euler’s criterion.

If we have chosen the smoothness bound B already, the next task is to apply Euler’s criterion to all primes up to B; the ones that pass the test make up what is called the *factor base*. These primes are going to be used to sieve out the *non-smooth* numbers from the sequence of numbers in the sieving interval. A *non-smooth* number is a number that has a prime factor greater than the smoothness bound. We want to work with only smooth numbers so we can have a manageable exponents matrix as explained before.

To illustrate these new ideas, let’s revisit the factorization of 227179. We choose B to be 50. Primes up to B, *B* = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47}.

Applying Euler’s criterion to each prime in *B* with 227179 as the base we find that 227179 is a quadratic residue modulo the following {2, 3, 5, 7, 13, 17, 23, 29, 37, 41, 43, 47}. Every integer is a quadratic residue modulo 2, there might be no need to test with 2.

So, our factor base is **{2, 3, 5, 7, 13, 17, 23, 29, 37, 41, 43, 47}**.

Now, on to the sieving. The benefit of sieving is that it helps to identify the smooth numbers without having to factorize them first, like we did, saving us some time and effort.

How it is done? Recall, once again, that the numbers in the sieving interval, i.e., 350, 1305, 2262, 3221,…,17846, 18837 were obtained by computing (477 + **n**)² **- **227179 for **n** = 0, 1, 2,….

The challenge now, is, among these numbers how do we identify those that factor completely over the factor base without doing the factorizations — those would be the smooth numbers. This is equivalent to asking what values of (477 + **n**)² **-** 227179 are divisible by primes in the factor base and only the primes in the factor base. If we can solve the quadratic congruence

(477 + **n**)² **-** 227179 ≡ 0(mod **p**) where **p** is a member of the factor base, then we have our answer. Fortunately, there is an efficient algorithm for solving such congruences; the Tonelli–Shanks** **algorithm.

So, we solve (477 + **n**)² **-** 227179 ≡ 0(mod **p**) for each **p** from the factor base and divide the corresponding values of (477 + **n**)² **-** 227179 by **p**.

With p = 2, solving (477 + **n**)² **- **227179 ≡ 0(mod 2) gives **n ≡ 0 (mod 2)**, or **n = 2k**, for k = 0, 1, 2, 3,…, so **n** = 0, 2, 4, 6,…This tells us

(477 + **n**)² **-** 227179 is divisible by p= 2 when n = 0, 2, 4, 6,…,so we divide those values by 2.

Here are the numbers computed in our sieving interval in tabular form, the top row contains the **n** values, and the bottom row contains the corresponding (477 + **n**)² **-** 227179 values.

Dividing entries in the bottom row by 2 where **n** = 0, 2, 4, 6,…, 16, 18 gives:

For every p > 2, the congruence will have two solutions. With p = 3, the two solutions are **n ≡ 1(mod 3) **or **n =** **3k + 1**, and **n ≡ 2(mod 3) **or **n = 3k + 2**.

So, **n** = 1, 4, 7, 10,…, and **n** = 2, 5, 8, 11,….

Dividing the entries in the table corresponding to these values of **n** by 3, we have:

With p = 5, the two solutions are **n ≡ 0(mod 5) **or **n =** 5**k**, and **n ≡ 1(mod 5) **or **n = 5k + 1**. So, **n** = 0, 5, 10, 15,…, and **n** = 1, 6, 11, 16,….

Dividing the entries in the table corresponding to these values of **n** by 5, we have:

Carrying out the same procedures for the remaining primes in the factor base, we finally obtain :

The entries that were reduced to 1 are smooth numbers, and we’ve already

obtained their factorizations during the process, so we can build a matrix with that and proceed as before.

In the interval, we were able to capture only four smooth numbers, partly because we did not take divison by powers of the factor base primes into consideration, these are not enough to guarantee a factorization of N, to get more smooth numbers/smooth relations we either increase the smoothness bound or sieve with prime powers(**pᵐ**) as well. The solutions of the congruence (477 + **n**)² — 227179 ≡ 0(mod **pᵐ**), where **m** > 1, can be easily obtained from the solutions of (477 + **n**)² — 227179 ≡ 0(mod **p**) obtained with the Tonelli-Shanks algorithm.

## Choosing the Smothness Bound

Choosing a smoothness bound, B, is a crucial first step in the quadratic sieve algorithm. A very small bound may fail to yield enough relations that combine to an even parity and a quite large bound will result in an unwieldy matrix. So, how do we choose an optimal value for the smoothness bound? There is an heuristic formular that can act as a guide:

Where **N** is the number to be factored, **log** is natural logarithm, **exp**(x) means **eˣ**, e = base of natural logarithm(approximately 2.71828). o(1) is a value approaching 0, you may roughly set it to 0.

## Summary/End Notes

In summary here are the steps involved in the algorithm, to factor N:

1. Choose a smoothness bound B.

2. Get all the prime numbers up to B. Sieve of Eratosthenes algorithm can help with that.

3. Use Euler’s criterion to determine whether N is a quadratic residue modulo each of the prime in step 2. Save the primes that pass the test in the factor base array. This step can be combined with step 2, so that everything is done in a single loop; once a prime is detected, apply Euler’s criterion to it.

4. Compute a sequence of numbers Qₙ = (x +n)² — N for n = 0, 1, 2, 3…, where x is the ceiling of the square root of N. How long should this sequence be? We need at least one more smooth relation than the number of primes in the factor base. Just a fraction of the Qₙ’s are going to be B-smooth, so we can make a fairly generous allowance of the Qₙ’s being at least six times

the size of the number of primes in the factor base.

5. Solve for **n** in the congruence (x + **n**)² — N ≡ 0(mod **p**) for each **p** in the factor base using Tonelli-Shanks algorithm. Sieve the **Qₙ**’s for B-smooth values by carrying out divisions **Qₙ **/ **p** based on the solutions of the congruence, note down the factors of each Qₙ.

6. Form a matrix with exponents of the prime factors of the smooth Qₙ’s obtained in step 5. Find the (left) null space of the matrix — use that to determine the set of relations that will combine to give an even parity.

For large matrices, simple Gaussian elimination does not fit the bill of computing null space, a more efficient algorithm like the *block Lanczos* or** ***block Wiedemann* algorithm is usually used.

7. Combining relations in step 6 will produce a single relation of the form

**a**² ≡ **b**²(mod **N**). From this we get the much sought-after factors of **N** by computing **gcd**(**a — b**, **N**) and **gcd**(**a + b**, **N**) using the Euclidean algorithm.

That is the basic idea of quadratic sieve factorization, there are several improved variants of the algorithm.