Mathematics

Projection Matrices and Gram-Schmidt

Published in

Intuition

7 min readJul 2, 2022

When working in finite-dimensional space, it is convenient to have a orthonormal basis; this simplifies calculations and ensures that the vectors in our set are linearly independent. Today, we explore a process called Gram-Schmidt which generates an orthonormal basis from a given set of vectors.

A brief interlude about orthogonality, normalisation, and orthogonal matrices

We define an operation acting on two finite vectors u and v, known as the dot or scalar product, as such:

The standard definition of the dot product.

The dot product produces a single number (a scalar, hence the alternate name) which is the sum of the corresponding elements of u and v multiplied together ― note that the vectors have to contain the same number of elements for this operation to be defined.

We sometimes write the dot product for two column vectors u and v as uᵀv. This allows us to use matrix multiplication to produce a 1x1 matrix, which should be considered our ‘scalar’ result from performing this operation.

Orthogonality: two vectors u and v are orthogonal if u.v = 0.

Orthogonality generalises the notion of perpendicularity to higher dimensions, such that we can work in larger spaces than R² and R³. Two vectors are orthogonal if their dot product is equal to zero, and we consider a set of vectors S = {v₁, v₂, …, vₙ} to be mutually orthogonal if every pair of vectors in S is orthogonal, meaning that each vector in the set is perpendicular to all others.

A mutually orthogonal set of vectors S in R³. Every possible pairing of the vectors gives a dot product result of 0, so we know that each vector is perpendicular to every other. This also implies that they are linearly independent and form a basis for R³ — if this is unfamiliar terminology, have a look at my article on the Four Fundamental Subspaces.

Square matrices with orthogonal columns are always invertible; each column of the matrix is linearly independent of the others, so there is no non-trivial combination of the columns which equals the zero vector. (If this justification is unfamiliar, read my article about the Four Fundamental Subspaces here; it covers linear independence in much more detail.)

Norm: a function from a real (or complex) vector space to the non-negative real numbers, acting as a measure of distance from the origin.

Vector spaces can be endowed with a norm, a function that tells us how far a vector is from the origin; this allows us to deal with the notion of ‘length’ in higher dimensions. A vector space with an attached norm is, unsurprisingly, called a normed vector space. Norms merit a future article in their own right, especially for complex and more general vector spaces, so today we are going to focus on a specific norm which is commonly used to calculate length in vector spaces ― the L² norm, which extends the Pythagorean theorem into finite-dimensional vector spaces of any size.

We calculate the L² norm of a vector v, denoted as |v|, by taking the square root of the sum of each of its components squared.

The definition of the L² vector for real vectors. We square each component, add the squares together, and take the square root.

A little mathematical shuffling shows us that this definition is equivalent to stating that we can define the magnitude of a vector as the square root of the dot product applied on itself.

The L² norm of a vector v is equal to the square root of the dot product of v with itself.

We say that a vector is normalised when the L² norm is equal to one, meaning that the vector has a length of one unit; to normalise a vector v, divide it by its magnitude |v|. Normalised vectors are useful for determining which direction a vector points in; every vector that points in the same direction will have the same normalised representation, and this allows us to easily compare different vectors.

Two vectors u and v with the same normalised form, showing that they both point in the same direction.

Orthonormal: a set of vectors is orthonormal if every vector is normalised and the vectors are mutually orthogonal.

An orthogonal matrix, usually denoted as Q, is a real square matrix in which the column vectors are orthonormal. They have several interesting properties: orthogonal matrixes are always invertible, their inverse matrices are their transpose, and the determinant of an orthogonal matrix is always either +1 or -1.

A few interesting examples of orthogonal matrices; the identity matrix, rotation matrices, and permutation matrices are all orthogonal.

Projection

To introduce us to the concept of projection, I will borrow the opening statement from Lecture 15 of 18.06Sc Linear Algebra. (This section will be largely guided by the train of thought that is presented in the lecture, so I highly recommend you supplement this article with it.)

If we have a vector V and a line determined by a vector A, how can we find the point on the line which is closest to V?

Our problem — given a line (red) and a vector (blue) , how do we find the point on the line which is as close as possible to the tip of our vector?

The point on the red line determined by a which is closest to v is the intersection of a line passing through v which is orthogonal to a. We can think of the green vector p as an approximation to v, and thus of the vector e = v - p (shown in dotted green) as the error of our approximation.

As p lies on the line spanned by a, we can say that p = xa for some scalar value x. From our diagram, we also know that a is perpendicular to e (= v - p = v - xa), so the dot product of a and (v - xa) must equal zero.

With a little expanding and rearranging, we have worked out how to express p in terms of a and v. The vector p is what we get when we project v onto the line determined by a.

The Gram-Schmidt process

Now that we’ve covered the scalar product, orthonormality, and the projection of vectors, we have a solid foundation that will help us to understand the Gram-Schmidt process. Given any set of vectors S which spans a particular vector space V, we can use this to generate an orthonormal and linearly independent basis for V.

Before we get started, there is some notation which you will need to accustom yourself to. We will use lower case letters {a, b, …, z} for vectors in our original (not orthonormal) set of vectors, capital letters {A, B, …, Z} for the orthogonal vectors we’ll generate in an intermediate step, and finally {q₁, q₂, …, qₙ) for our orthonormal vectors.

Let’s begin with a three-dimensional example.

Our set S of vectors; the vectors are neither orthogonal nor normalised.

We will first find orthogonal vectors which span the same space as S. Let our first orthogonal vector A = a.

To find a vector orthogonal to A, we project b onto A and let B = b - p. B is what we previously referred to as e; the error vector is orthogonal to A, so we can use it as the second vector in our orthogonal set.

Finally, we must calculate the vector C which is orthogonal to both A and B. To do so, we take our original vector c and subtract its components in the A and B directions; we find those by projecting c onto A and B in turn.

Finally, we divide each of A, B, and C by its length to give us our orthonormal vectors.

This process can be extended to any number of dimensions, and thus an arbitrarily large set of vectors.

For an arbitrarily large set of vectors, we can apply Gram-Schmidt to generate an orthonormal basis for the space our original set spans.

Through Gram-Schmidt, we effectively ‘strip away’ any dependencies which exist between the vectors ― in the extreme case of a vector in our set being composed of a linear combination of the other vectors in the set, it will be completely reduced down to the zero vector. This implies that there is no independent/orthogonal part of the vector which would remain, hence it can be omitted from our final set; Gram-Schmidt always gives a basis for the space spanned by the original set of vectors.

Further exploration

Orthogonal Matrices and Gram-Schmidt from MITOpenCourseWare.

The Wikipedia page for the Gram-Schmidt process.