Binary Quadratic Forms and Primes of the Form x² + ny²
Here is an interesting observation about numbers: Some prime numbers can be written as sums of two squares(x² + y²), like:
5 = 1² + 2²
13 = 2² + 3²
17 = 1² + 4²
29 = 2² + 5²
You’ll notice that such primes are of the form 4k + 1, where k is some whole number. The french mathematician, Fermat, knew about this and elevated it to a theorem:
Every prime number which surpasses by one a multiple of four is composed of two squares.
Similarly, some prime numbers can be written in the form x² + 2y², like:
3 = 1 + 2(1²)
11 = 3 + 2(2²)
17 = 9 + 2(2²)
19 = 1 + 2(3²) = 11 + 2(2²)
These are prime numbers of the form 8k + 1 and 8k + 3.
Also, some prime numbers can be written in the form x² + 3y², like:
7 = 4 + 3(1²)
13 = 1 + 3(2²)
19 = 7 + 3(2²)
31 = 4 + 3(3²) = 19 + 3(2²)
These are prime numbers of the form 3k + 1.
Fermat knew about these too and stated them in letters to his colleagues - without proofs, though he claimed to have solid proofs. May be the margins were too small to contain them.
Naturally, one is led to ask, in general, what prime numbers can be written in the form x² + ny² for a given integer n? Sounds like a problem in recreational mathematics with no practical value. Nonetheless, it caught the interest of many eminent mathematicians for centuries. The theory of binary quadratic forms was developed to help answer this question.
x² + y², x² + 2y², x² + 3y² are all examples of binary quadratic forms - with missing terms. A binary quadratic form in its full glory is an algebraic expression of the form:
ax² + bxy + cy²
It is binary because there are just two variables(x and y) involved and quadratic because the degrees of the variables add up to two in each term.
The coefficients a, b, c are usually integers.
A binary quadratic form is a natural mathematical object, it is bound to have interesting properties and solve some problem in the universe(talk about practical applications).
In this post, we will use the abbreviation bqf for binary quadratic form and denote a specific bqf by its coefficients as (a, b, c). Examples:
3x² - 5xy + 4y² = (3, -5, 4), 7x² + 23y² = (7, 0, 23).
A bqf is said to be primitive if its coefficients a, b, c are relatively prime, that is, their greatest common factor is 1. 3x² - 5xy + 4y² is primitive but
2x² + 6xy - 4y² is not.
Equivalent Forms and the Discriminant
By some algebraic manipulations(completing the square), a bqf can be rewritten as follows:
4a(ax² + bxy + cy²) = (2ax +by)² - (b² - 4ac)y².
The coefficient of y² on the right, that is b² - 4ac, is called the discriminant of the bqf ax² + bxy + cy². Many properties of a bqf are tied to the value of its discriminant.
If the discriminant(b² - 4ac) is negative and either a or c is positive then the bqf will always be positive for whatever values of x and y. In such case, the bqf is said to be positive definite. Similarly, if the discriminant is negative and either a or c is negative, the bqf will only produce negative values, it is negative definite.
If the discriminant is positive, the bqf can take both positive and negative values depending on the values of x and y. In this case, the bqf is said to be indefinite.
You may want to note that the expression b² - 4ac, for whatever values of a, b, c, will always yield either a multiple of 4 or a multiple of 4 plus 1, since the first term(b²) is a perfect square(and therefore is either a multiple of 4 or a multiple of 4 plus 1) and the second term(4ac) is clearly a multiple of 4. So, the discriminant of any bqf is restricted to numbers of the form 4k or 4k + 1, where k is an integer. Numbers like 2, 3, 6, 7, 10, 43, etc. can never be the discriminant of a bqf.
By simple symmetry we can tell the forms x² + 7y² and y² + 7x² are essentially the same, they represent the same set of numbers - they are equivalent. Swapping x and y does not change the structure of the form.
Though not as obvious as the previous case, the forms 3x² + 5y² and
3x² + 6xy + 8y² are also equivalent, ’cause 3x² + 6xy + 8y² =
3(x + y)² + 5y² = 3X² + 5Y², letting X = x + y, Y = y.
If you calculated the discriminants, you would find that they are equal for the forms in each case. In the first case, the discriminant of both forms is -28. In the second case, -60. The discriminant is an invariant property of equivalent forms.
We applied the transforms
(x, y) 🠆 (y, x) and
(x, y) 🠆 (x + y, y) to two bqf’s to get new bqf’s which turned out to be equivalent to the original bqf’s. In general, we can apply the following linear transform to get a new bqf:
(x, y) 🠆 (Ax + By, Cx + By) . Where A, B, C, D are integers.
Let’s transform f(x, y) = ax² + bxy + cy² to get a new bqf g(x, y).
f(x, y) 🠆 g(x, y):
ax² + bxy + cy² 🠆 a(Ax +By)² + b(Ax +By)(Cx +Dy) + c(Cx +Dy)².Expand and simplify the expression on the right of the arrow to get:g(x, y)=(aA² + bAC + cC²)x² + (2aAB + bAD + 2cCD + bBC)xy + (aB² + bBD + CD²)y².Let the discriminants of f(x, y) and g(x, y) be D(f) and D(g) respectively, then D(f) = b² - 4ac. Likewise, D(g) can be calculated from the expression for g(x, y) above, but it's a bit complicated.If you are able to work out the expression for D(g), you will find there is a relationship between D(f) and D(g) as follows:D(g) = (AD - BC)²D(f).For f(x, y) and g(x, y) to be equivalent, their discriminants have to be equal, i.e., D(g) = D(f). That means we have to set
(AD - BC)² = 1. This implies (AD - BC) = 1 or (AD - BC) = -1.
So, two bqf’s f(x, y) and g(x, y) are equivalent if(and only if) g(x, y) can be obtained from f(x, y) by a linear transform:
(x, y) 🠆 (Ax + By, Cx + By) such that (AD — BC) = 1 or (AD — BC) = -1.
When (AD — BC) = 1, f(x, y) and g(x, y) are said to be properly equivalent, otherwise, improperly equivalent.
Having the same discriminant is a neccessary but not sufficient condition for two forms to be equivalent, for example x² + 5y² and 2x² + 2xy + 3y² have the same discriminant but they are not equivalent because x² + 5y² represents 1 when x = 1, y = 0, but 2x² + 2xy + 3y² can never be equal to 1 for whatever values of x and y.
It is not always easy or even advisable to check equivalence by brute inspection. To tell if two forms(with the same discriminant) are equivalent or not, we either find a way to determine the transform
(x, y) 🠆 (Ax + By, Cx + By) that transforms one into the other and check if |(AD — BC)| = 1 or we check if their reduced forms are the same.
Reduction is the operation that transforms a bqf into a basic, primal form that is equivalent to the original bqf. Every positive definite bqf can be transformed into a unique reduced form to which it is properly equivalent. We will not discuss the reduction algorithm now, may be in a future post, we just want to point out that bqf’s with the same discriminant are not neccessarily equivalent.
A (positive definite) form (a, b, c) is said to be reduced if |b| ≤ a ≤ c and b ≥ 0 if either |b| = a or a = c, where |b| stands for the absolute value of b, that is, b without a minus sign. You can check with the definition that the following forms are reduced: (1, 0, 1), (3, 2, 3), (3, -2, 4).
There is an infinitude of bqf’s with any given discriminant D, and we can “construct” as many as we wish by choosing appropriate values for the coefficients a, b, c.
When D = -12, for example, we can have the following bqf’s:
(14, 10, 2), (1, 12, 39), (4, 22, 31), (199, 28, 1), (4, 14, 13), (14, 10, 2).
Their corresponding reduced forms are:
(2, 2, 2), (1, 0, 3), (1, 0, 3), (1, 0, 3), (1, 0, 3), (2, 2, 2).
Just the two reduced forms (2, 2, 2) and (1, 0, 3) shows up, and these are the only reduced forms for D = -12, even if we generated a billion bqf’s - they are all going to fall into either of the two equivalence classes. We say the class number of -12, h(-12) is 2. It can be proved that the class number for any D is finite.
With D = -4, there’s just one equivalence class. All bqf’s with discriminant -4 can be reduced to (1, 0, 1). The class number of -4, h(-4), is 1.
Through a series of propositions we will see how the theory of binary quadratic forms we’ve developed so far helps with the question of representing numbers, especially prime numbers. First, we will revise some relevant basic concepts.
Congruence. In modular arithmetic, two quantities, say x and y, are said to be congruent modulo a third one, say n, if they leave the same remainder when divided by n, or in other words, if you get an integer when you divide their difference by n. We use the congruence symbol(≡) to denote this:
x ≡ y (mod n) or y ≡ x (mod n). Example: 27 ≡ 17 (mod 5).
Quadratic Residue. Residue is just a fancy word for “remainder”. An integer q is called a quadratic residue modulo n if it is congruent to a perfect square modulo n; i.e., if there exists an integer x such that x² ≡ q (mod n).
Legendre Symbol. The Legendre symbol (a/p) indicates whether a is a quadratic residue modulo p or not. It is equal to 1 if a is a quadratic residue modulo p, and -1 if not. If p divides a, the symbol is set to 0.
We say an integer m is represented by a form f(x, y) if there exist some integers x₀ and y₀ such that f(x₀, y₀) = m. Moreover, if x₀ and y₀ have no common factor, then we say that m is properly represented by f(x, y).
Example: x² + 7y² properly represents 53 at x = 5, y = 2.
Proposition 1. A form f(x, y) properly represents an integer m if and only if f(x, y) is properly equivalent to the form mx² + Gxy + Hy², where G, H are arbitrary integers.
Obviously, mx² + Gxy + Hy² represents m with x = 1, y = 0, but the proposition makes two strong assertions that we are going to state and prove separately.
Assertion (i). If a form f(x, y) properly represents m, then it must be equivalent to mx² + Gxy + Hy².
Proof: If f(x, y) = ax² + bxy + cy² properly represents m, then there exist some integers, say, A and C, with no common factor, such that f(A, C) = m. And since A and C have no common factor(their greatest common divisor is 1), by the Bezout identity, we can find two integers, say, B and D, such that AD — BC = 1.
Apply the linear transform (x, y) 🠆 (Ax + By, Cx + By) to f(x, y) as detailed in the grey box above, we have:
f(Ax + By, Cx + By) = (aA² + bAC + cC²)x² + (2aAB + bAD + 2cCD + bBC)xy + (aB² + bBD + CD²)y² = mx² + Gxy + Hy², since f(A, C) = m = aA² + bAC + cC²,
and we can set G = 2aAB + bAD + 2cCD + bBC, H = aB² + bBD + CD². Therefore, f(x, y) is equivalent to mx² + Gxy + Hy².
Assertion (ii). If a form f(x, y) is equivalent to mx² + Gxy + Hy² then it properly represents m.
Proof: Since f(x, y) is equivalent to mx² + Gxy + Hy², there is a linear transformation (x, y) 🠆 (Ax + By, Cx + By) such that
f(Ax + By, Cx + By) = mx² + Gxy + Hy². With x = 1, y = 0, we have
f(A, C) = m. So f(x, y) represents m at the point (A, C). For the “properly” part, remember A, B, C, D are supposed to be integers that satisfy (AD — BC) = 1 or (AD — BC) = -1. Since AD — BC = 1, by the Bezout identity again, A and C have no common factor(their gcd = 1). Proof completed.
So, Proposition 1 is proved but it does not seem to help much with the representation problem. Actually, it is a lemma, we need it just to prove Proposition 2, which is the main theorem.
Proposition 2. Let m be an integer having no common factor with another integer d. Then m is properly represented by a primitive form of discriminant d if and only if d is a quadratic residue modulo 4m.
With proposition 2, to see if a bqf represents a number m, it seems we just have to check that it’s discriminant d, is a quadratic residue modulo 4m. But let’s prove it first before we go on.
Proof: Suppose that m is properly represented by a bqf, f(x, y), with discriminant d, then by Proposition 1, f(x, y) must be equivalent to the bqf mx² + Gxy + Hy², with discriminant, d = G² - 4mH, if we rewrite d as
-4mH + G², we can clearly see that d divided by 4m leaves a remainder of G², a perfect square, therefore d is a quadratic residue modulo 4m.
On the other hand, let d be a quadratic residue modulo 4m, that means there exists an integer, say G, such that G² ≡ d (mod 4m) which implies there is an integer, say H, such that (G²- d)/4m = H, or, d = G² - 4mH. We see that d = G² - 4mH is the discriminant of the bqf mx² + Gxy + Hy² which properly represents m. QED.
Now, let’s apply this proposition to the question of which primes are represented by the form x² + ny². The discriminant is -4n. According to the proposition, a primitive form of discriminant -4n will properly represent a prime p, where p is not a factor of n, if (and only if) -4n is a quadratic residue modulo 4p, that is if (and only if) the Legendre symbol (-4n/4p) = 1, or what is the same thing (-n/p) = 1, we just need to find a way to determine such primes p that satisfy the condition. But there is a snag - there could be other bqf’s apart from x² + ny² that have the discriminant -4n, we’ve already seen that it is possible for non-equivalent bqf’s to have the same discriminant. When n = 5, for example, we have x² + 5y² with discriminant -20. 2x² + 2xy + 3y² also has a discriminant of -20, but the two forms are not equivalent, so which of them does Proposition 2 apply to?
When n = 1, we have x² + y² with discriminant -4, and it happens to be the only bqf(in reduced form) with discriminant -4, so we can use Proposition 2 on it with certainty; primes that can be written in the form x² + y² are the primes that satisy the condition (-1/p) = 1. With the help of the law of quadratic reciprocity, we find such primes are primes that satisfy the congruence p ≡ 1 (mod 4), e.g. 5, 13, 17, 29.
With n= 2, we have x² + 2y² having discriminant -8. It is the only bqf too with that discriminant, so we look for primes p’s that satisfy (-2/p) = 1 using the law of quadratic reciprocity and get p ≡ 1 (mod 8), p ≡ 3 (mod 8), e.g 11, 17, 19, 41.
When n = 3, we have x² + 3y² with discriminant -12. 2x² + 2xy + 2y² also has discriminant -12 and it does not belong to the same equivalence class with
x² + 3y², there seems to be a snag here, not quite. Note that 2x² + 2xy + 2y² is not a primitive form, so we can ignore it since Proposition 2 is about primitive forms. Proceeding as before, we find primes of the form x² + 3y² to be p = 3, p ≡ 1 (mod 3).
So, we can use Proposition 2 (together with the law of quadratic reciprocity) to determine the primes represented by x² + ny² only when n makes the class number of the discriminant equal to 1, that is, when h(-4n) = 1. For other values of n we need to develop the theory further. Stay tuned for future posts.
In closing, I will like to say the theory of binary quadratic forms is a very beautiful theory with connections to other branches of number theory, and for those interested in “practical value”, it has applications in cryptography.
- Can you write the general expression for:
i) a ternary(3 variables) quadratic form.
ii) a binary cubic(degree 3) form?
- Can you tell if the equation 2x² - 6xy + 5y² = 43 has a solution in integers or not. Give reason for your answer.
- Can you write an algorithm that takes as an input a number(should be a multiple of 4 or a multiple of 4 plus 1) and outputs random bqf’s with the input as determinant.
Cox, David A. Primes of the Form x² + ny²: Fermat, Class Field Theory, and Complex Multiplication. John Wiley and Sons, Inc., Hoboken, NJ, Second edition, 2013.