# Here’s How Quadratic Sieve Factorization Works

Finding the product of two or more numbers is a straightforward mathematical operation, but the inverse operation of decomposing a number into its factors is not a walk in the park. This difficulty in factorization is the basis of the RSA cryptosystem.

Factorization has always been of great interest, and even more so with the rise of modern cryptography.

Given a number N, a positive rational integer, what are the computational operations we can carry out on N to obtain its factors? The basic, nay crude approach is trial division; divide N by lesser numbers(from 2 up to the square root of N) and see which ones leave no remainders.

Another elementary approach is Fermat’s factorization method, which is based on the difference of two squares identity : x² - y² = (x-y)(x+ y).
To factorize N using this method, we have to look for two whole numbers x and y such that x²-y² = N or, what is the same thing, x²-N = y². Then the factors of N will be (x-y) and (x+y) according to the stated identity. Algorithmically, we choose an initial value for x and increase it in steps of 1, computing x²-N in each step, and stopping when it(x²-N) is a perfect square(). The starting value of x would be the least positive integer that makes x²-N > 0, that is, x² > N. So, the starting value of x would be the least positive integer greater than the square root of N, assuming N is not a perfect square. Note that just two factors which may not all be prime are produced by this method, if we are interested in prime factors we may need to test the output for primality or factorize them again, except we know beforehand that N is made up of just two prime factors like in the case of RSA moduli.

The Quadratic Sieve factorization algorithm and the more advanced Number Field Sieve algorithm are based on the basic idea of Fermat factorization method, so, we take an example to see how it works.
Example : Factorize N = 125,513.
The square root of 125,513 is 354.2781…
We take our starting value to be x = 355.
Now, compute x² - N successively, and check if it is a perfect square:

355² - 125,513 = 126,025 - 125,513 =512 (not a perfect square).
356² - 125,513 = 126,737 - 125,513 = 1,223 (not a perfect square).
357² - 125,513 = 127,449 - 125,513 = 1,936 = 44² (a perfect square).

We got a perfect square at the third step; 357² - 125,513 =44², so
125,513 = 357² - 44² = (357 – 44)(357 + 44) = 127 * 181.

The example was purposely chosen to take just a few steps, in contrast, factorizing 80,723 this way will take 214 steps to get to a perfect square.

The two factorization methods described in the introduction are useful for factorizing small numbers, say, with less than twenty digits. For bigger numbers such as the ones used in cryptography, there is need for more efficient algorithms.

The quadratic sieve algorithm has at its foundation, a modification of Fermat’s method suggested by the Belgian mathematician, Maurice Kraitchik. Here, we seek for integers x and y that satisfy the congruence x² ≡ y²(mod N), or what is the same thing, the expression x² - y² = kN, where k is some positive integer(not equal to 1, otherwise, it becomes Fermat’s method). This is potentially a quicker method of achieving our goal of factoring N, if approached the way we are going to describe. The theory behind it is: if
x² - y² = kN, then the greatest common divisor of x - y and N, gcd(x - y, N), is a proper factor of N, so is gcd(x + y, N), provided N divides neither of x - y and x + y.

Accordingly, to obtain the factors of N, search for integers x and y that satisfy the congruence x² ≡ y²(mod N). Then compute gcd(x - y, N) and gcd(x + y, N). The gcd can be computed efficiently with Euclidean algorithm. Let’s take a toy example to see how it works.

Example: factorize 227,179.
We want to search for integers x and y that satisfy x² ≡ y² (mod 227179), or what is the same thing, x² mod 227179 =y². Like in the case of Fermat’s method, we start with x equals the ceiling of the square root of 227179, and compute x² mod 227179 successively to obtain a series of relations.
x² mod 227179 has the same value as x² - 227179 in the range(sieving interval) we are going to work in, so we compute that instead.

477² - 227179 = 350 (not a perfect square) = 2 × 5² × 7
478² - 227179 = 1305 (not a perfect square) = 3² × 5 × 29
479² - 227179 = 2262 (not a perfect square) = 2 × 3 × 13 × 29
480² - 227179 = 3221 (not a perfect square) = 3221
481² - 227179 = 4182 (not a perfect square) = 2 × 3 × 17 × 41
482² - 227179 = 5145 (not a perfect square) = 3 × 5 × 7³
483² - 227179 = 6110 (not a perfect square) = 2 × 5 × 13 × 47
484² - 227179 = 7077 (not a perfect square) = 3 × 7 × 337
485² - 227179 = 8046 (not a perfect square) = 2 × 3³ × 149
486² - 227179 = 9017 (not a perfect square) = 71 × 127
487² - 227179 = 9990 (not a perfect square) = 2 × 3³ × 5 × 37
488² - 227179 = 10965 (not a perfect square) =3 × 5 × 17 × 43
489² - 227179 = 11942 (not a perfect square) = 2 × 7 × 853
490² - 227179 = 12921 (not a perfect square) = 3 × 59 × 73
491² - 227179 = 13902 (not a perfect square) = 2 × 3 × 7 × 331
492² - 227179 = 14885 (not a perfect square) = 5 × 13 × 229
493² - 227179 = 15870 (not a perfect square) = 2 × 3 × 5 × 23²
494² - 227179 = 16857 (not a perfect square) = 3² × 1873
495² - 227179 = 17846 (not a perfect square) = 2 × 8923
496² - 227179 = 18837 (not a perfect square) = 3² × 7 × 13 × 23

After twenty steps, we still haven’t got a perfect square. Well, the idea of the algorithm is really not to get a perfect square this way, the idea is to get a perfect square by multiplying some of the above relations together. By inspection, we see that we can multiply relations 1, 6, and 17 to get a perfect square:

477² ≡ 2 × 5² × 7 (mod 227179)
482² ≡ 3 × 5 × 7³ (mod 227179)
493² ≡ 2 × 3 × 5 × 23² (mod 227179)

We multiply terms on corresponding sides of the congruences to get the relation we want;

477² × 482² × 493² ≡ (2 × 5² × 7) × (3 × 5 × 7³) × (2 × 3 × 5 × 23²)
(477× 482 × 493)² ≡ (2² × 3² × 5⁴ × 7⁴ × 23²) (mod 227179)
(113,347,602)² ≡ (2 × 3 × 5² × 7² × 23)² (mod 227179)
(113,347,602)² ≡ (169,050)² (mod 227179)
113,347,602 is greater than 227179, we have to reduce it mod 227179 to have:(212,460)² ≡ (169,050)² (mod 227179).
So, one factor of 227179 is gcd(212460 - 169050, 227179)
= gcd(43410, 227179) = 1447.
We obtain the other factor by computing gcd(212460 +169050, 227179) or dividing 227179 by 1447.

Checking manually to see which relations multiply to give a perfect square is obviously impracticable when we have a large set of relations, and, there is a need to algorithmize the process so a computer program can handle it; a computer does not have a mind of its own. When inspecting, we looked for a set of relations where the exponents of each prime factor on the right hand side add up to an even number, we can algorithmize this “inspection” with Linear Algebra; first, we set up a matrix of the exponents of the prime factors on the right hand side. The biggest prime factor in the set is 8923. A matrix of exponents of primes from 2(the least in the set) up to 8923 is going to be way too big for the small number we are trying to factorize, so we choose a (smaller) bound for the primes we want to work with, called the smoothness bound, B. If we choose B to be 50 for example, we can ignore those relations with prime factors greater than 50 and work only with those with prime factors up to 50 called the smooth relations, or specifically, 50-smooth relations. Relations 1, 2, 3, 5, 6, 7, 11, 12, 17, and 20 above qualify;

2 × 5² × 7
3²× 5 × 29
2 × 3 × 13 × 29
2 × 3 × 17 × 41
3 × 5 × 7³
2 × 5 × 13 × 47
2 × 3³ × 5 × 37
3 × 5 × 17 × 43
2 × 3 × 5 × 23²
3² × 7 × 13 × 23

We form a matrix with the exponents of the prime factors above. There are fifteen primes up to 50(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47), so our matrix will have fifteen columns. 0 is placed to represent the exponent of any missing prime P between 2 and 47(makes sense, since 1 = P⁰). The matrix is as follows:

The matrix can be simplified to make computation faster by recording just the parities of the entries; 0 for even numbers, 1 for odd numbers:

Now, for this matrix, we are interested in the rows that will add up to an even parity i.e. [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]. These can be found by computing the left null space of the matrix using standard linear algebraic procedures - which can be coded in a computer program. If M represents the matrix above and S, it’s left null space, then, we have:

S × M = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

Solving the matrix equation(by Gaussian Elimination and working with only the parities of the entries) gives S = [1 0 0 0 1 0 0 0 1 0 ]. The 1st, 5th, and 9th entries of S have the value 1, this indicates rows 1, 5, and 9 in our exponents matrix M(corresponding to relations 1, 6, 17) are the rows that will add up to an even parity - and the corresponding relations will combine to give a perfect square.

Important Note : We are just lucky to get a factorization with just ten smooth relations(ten matrix rows), normally, we would need more rows than columns(more smooth relations than prime factors) to guarantee linear dependence of the matrix rows and for the left null space to be nontrivial — when writing an implementation, extend the sieving interval so as to have more smooth relations than prime factors, you don’t want to leave things to chance in an algorithm.

We’ve not done any sieving yet. This is the heart of the algorithm, an idea introduced by Carl Pomerance to make the procedures we’ve discussed so far more efficient, but before we go into this, let’s introduce the notion of factor base. Recall that at the beginning, we computed x² - 227179 for various values of x to obtain a sequence of numbers;
350, 1305, 2262, 3221,…,17846, 18837.
We then factorized each of these numbers and used the exponents of the prime factors to form a matrix. It is possible to determine these prime factors in advance without factorizing numbers in the sequence using the fact that 227179 or whatever number to be factored is a quadratic residue mod each of these prime numbers. A number N is a quadratic residue mod p if it is congruent to a square, i.e., there exists an integer T such that T² N(mod p).
Now, to get the prime numbers for which N(227179, in this case) is a quadratic residue, we use Euler’s criterion.

If we have chosen the smoothness bound B already, the next task is to apply Euler’s criterion to all primes up to B; the ones that pass the test make up what is called the factor base. These primes are going to be used to sieve out the non-smooth numbers from the sequence of numbers in the sieving interval. A non-smooth number is a number that has a prime factor greater than the smoothness bound. We want to work with only smooth numbers so we can have a manageable exponents matrix as explained before.

To illustrate these new ideas, let’s revisit the factorization of 227179. We choose B to be 50. Primes up to B, B = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47}.

Applying Euler’s criterion to each prime in B with 227179 as the base we find that 227179 is a quadratic residue modulo the following {2, 3, 5, 7, 13, 17, 23, 29, 37, 41, 43, 47}. Every integer is a quadratic residue modulo 2, there might be no need to test with 2.
So, our factor base is {2, 3, 5, 7, 13, 17, 23, 29, 37, 41, 43, 47}.

Now, on to the sieving. The benefit of sieving is that it helps to identify the smooth numbers without having to factorize them first, like we did, saving us some time and effort.

How it is done? Recall, once again, that the numbers in the sieving interval, i.e., 350, 1305, 2262, 3221,…,17846, 18837 were obtained by computing (477 + n- 227179 for n = 0, 1, 2,….

The challenge now, is, among these numbers how do we identify those that factor completely over the factor base without doing the factorizations — those would be the smooth numbers. This is equivalent to asking what values of (477 + n- 227179 are divisible by primes in the factor base and only the primes in the factor base. If we can solve the quadratic congruence
(477 + n- 227179 ≡ 0(mod p) where p is a member of the factor base, then we have our answer. Fortunately, there is an efficient algorithm for solving such congruences; the Tonelli–Shanks algorithm.
So, we solve (477 + n- 227179 ≡ 0(mod p) for each p from the factor base and divide the corresponding values of (477 + n- 227179 by p.

With p = 2, solving (477 + n- 227179 ≡ 0(mod 2) gives n ≡ 0 (mod 2), or
n = 2k, for k = 0, 1, 2, 3,…, so n = 0, 2, 4, 6,…This tells us
(477 + n- 227179 is divisible by p= 2 when n = 0, 2, 4, 6,…,so we divide those values by 2.

Here are the numbers computed in our sieving interval in tabular form, the top row contains the n values, and the bottom row contains the corresponding (477 + n- 227179 values.

Dividing entries in the bottom row by 2 where n = 0, 2, 4, 6,…, 16, 18 gives:

For every p > 2, the congruence will have two solutions. With p = 3, the two solutions are n ≡ 1(mod 3) or n = 3k + 1, and n ≡ 2(mod 3) or n = 3k + 2.
So, n = 1, 4, 7, 10,…, and n = 2, 5, 8, 11,….
Dividing the entries in the table corresponding to these values of n by 3, we have:

With p = 5, the two solutions are n ≡ 0(mod 5) or n = 5k, and n ≡ 1(mod 5) or n = 5k + 1. So, n = 0, 5, 10, 15,…, and n = 1, 6, 11, 16,….
Dividing the entries in the table corresponding to these values of n by 5, we have:

Carrying out the same procedures for the remaining primes in the factor base, we finally obtain :

The entries that were reduced to 1 are smooth numbers, and we’ve already
obtained their factorizations during the process, so we can build a matrix with that and proceed as before.

In the interval, we were able to capture only four smooth numbers, partly because we did not take divison by powers of the factor base primes into consideration, these are not enough to guarantee a factorization of N, to get more smooth numbers/smooth relations we either increase the smoothness bound or sieve with prime powers(pᵐ) as well. The solutions of the congruence (477 + n)² — 227179 ≡ 0(mod pᵐ), where m > 1, can be easily obtained from the solutions of (477 + n)² — 227179 ≡ 0(mod p) obtained with the Tonelli-Shanks algorithm.

## Choosing the Smothness Bound

Choosing a smoothness bound, B, is a crucial first step in the quadratic sieve algorithm. A very small bound may fail to yield enough relations that combine to an even parity and a quite large bound will result in an unwieldy matrix. So, how do we choose an optimal value for the smoothness bound? There is an heuristic formular that can act as a guide:

Where N is the number to be factored, log is natural logarithm, exp(x) means , e = base of natural logarithm(approximately 2.71828). o(1) is a value approaching 0, you may roughly set it to 0.

## Summary/End Notes

In summary here are the steps involved in the algorithm, to factor N:

1. Choose a smoothness bound B.

2. Get all the prime numbers up to B. Sieve of Eratosthenes algorithm can help with that.

3. Use Euler’s criterion to determine whether N is a quadratic residue modulo each of the prime in step 2. Save the primes that pass the test in the factor base array. This step can be combined with step 2, so that everything is done in a single loop; once a prime is detected, apply Euler’s criterion to it.

4. Compute a sequence of numbers Qₙ = (x +n)² — N for n = 0, 1, 2, 3…, where x is the ceiling of the square root of N. How long should this sequence be? We need at least one more smooth relation than the number of primes in the factor base. Just a fraction of the Qₙ’s are going to be B-smooth, so we can make a fairly generous allowance of the Qₙ’s being at least six times
the size of the number of primes in the factor base.

5. Solve for n in the congruence (x + n)² — N ≡ 0(mod p) for each p in the factor base using Tonelli-Shanks algorithm. Sieve the Qₙ’s for B-smooth values by carrying out divisions Qₙ / p based on the solutions of the congruence, note down the factors of each Qₙ.

6. Form a matrix with exponents of the prime factors of the smooth Qₙ’s obtained in step 5. Find the (left) null space of the matrix — use that to determine the set of relations that will combine to give an even parity.
For large matrices, simple Gaussian elimination does not fit the bill of computing null space, a more efficient algorithm like the block Lanczos or block Wiedemann algorithm is usually used.

7. Combining relations in step 6 will produce a single relation of the form
a² ≡ b²(mod N). From this we get the much sought-after factors of N by computing gcd(a — b, N) and gcd(a + b, N) using the Euclidean algorithm.

That is the basic idea of quadratic sieve factorization, there are several improved variants of the algorithm.

--

--

Programmer. Mathematician. Thinker. Entrepreneur, among other things. Give me a place to sit and I will move the world.