Understanding The Math Behind Dimension Reduction in Facial Recognition(2)
Read my beginner-friendly proofs to explore the application of linear algebra
In the last article of this series, our goal was to preserve the entire variance of the face space. As a trade-off, we still needed to include dimensions with relatively small variances, which increased computational difficulty. If we are willing to sacrifice minor loss of variance in exchange for major gains of computational efficiency, we can further reduce the dimension of the eigenface subspace by only selecting dimensions with the largest variances. The above approach is called the Principal Component Analysis (PCA.)
Given a normalized matrix A, we define the first principal direction as that in which the data experiences the most variance. To avoid interfering with the already confirmed direction of maximal variance, we define the second principal direction as the one with the most variance, among all directions orthogonal to the first principal direction. The following principal directions are defined in a similar fashion.
If we can prove that the jᵗʰ principal direction of a matrix A is the unit eigenvector of matrix C = α AᵗA which corresponds to the jᵗʰ largest eigenvalue of C, we will be able to capture the most variance of A by selecting the eigenvectors of C which correspond to large eigenvalues. That is the gist of PCA, and also what we are is going to prove today.
Before delving into the proof, we would be benefited to pause a bit and compare the goal of this article with that of the last article. In the last article, our goal was to find an orthogonal basis of the covariance matrix C, since an orthogonal basis would eliminate covariance across dimensions and hence most efficiently utilize variance. To achieve that goal, we first showed that C has an orthogonal basis that happens to be an eigenbasis. We then showed that in this eigenbasis, eigenvectors corresponding to zero eigenvalue are removable, as there is zero variance on the dimensions they span. Following this logic, we achieved our goal in the last section without the need to prove that the maximum variance lies in the direction of the eigenvector with the largest eigenvalue. Instead, we justified the relevance of eigenvectors by showing that the orthogonal basis is composed of eigenvectors. In this article, however, we want to directly select individual dimensions with large variances. Consequently, we need to reestablish the relevance of eigenvectors by justifying a more fundamental question: Among all vectors, why do eigenvectors point to principal directions? With this in mind, let us dive into the proof:
Lemma 3: The jᵗʰ principal direction of a normalized matrix A is the unit eigenvector of C = α AᵗA (where α is the inverse of the number of rows in A) corresponding to the jᵗʰ largest eigenvalue of C. The variance on the jᵗʰ principal direction is the jᵗʰ largest eigenvalue of C.
To prove Lemma 3, Let u be any unit vector. Let v_1 … v_n
be the row vectors of A. Let σᵤ² be the variance of v_1 … v_n
on the direction of u. According to the fourth line of equation 11 in the last article, we have:
To prove by induction, let P(k) be the statement that: The kᵗʰ principal direction of a A is the unit eigenvector of C = α AᵗA which corresponds to the kᵗʰ largest eigenvalue of C. The variance on the kᵗʰ principal direction is the kᵗʰ largest eigenvalue of C.
Base case: When k = 1, let λ₁ be the largest eigenvalue of C. Since C is symmetric, according to the Spectral Theorem, we can write C as:
where D is a diagonal matrix and Q is an orthogonal matrix. The eigenvalues of C appear on the diagonal of D and the corresponding eigenvectors are the columns of Q.
Combine equation 13 and 14, we have:
Since Q is an orthogonal matrix and orthogonal transformation preserves length, y is also a unit vector, i.e.
Now, to transform yᵗDy
into quadratic form, let λ₁ ≥ λ₂ ≥ ... ≥ λn
be the eigenvalues of C. Since these λ values are also the diagonal entries of D, we have:
Combine equation 15 and 17, we get:
The maximal value is achieved when u is the unit eigenvector of C with eigenvalue λ₁. Therefore, the base case P(1) is correct.
Inductive Hypothesis: Assume that P(1), … , P(j-1) are all correct for some positive integer j ≥ 2.
Inductive Step: We will now show P(j) is correct. Let λ₁ ≥ …≥ λⱼ be the eigenvalues of C with corresponding orthogonal eigenvectors ω₁, … , ωⱼ (their orthogonality can be proved by the Spectral Theorem.) Since P(1), … , P(j-1) are all correct, we know that ω₁, … , ωⱼ⁻₁ point to the first j-1 principal directions. Therefore, to find the jᵗʰ principal direction is to the find the unit vector u that maximize the following expression:
The requirement after “|” is added since the jᵗʰ principal direction needs to be orthogonal to all the previous principal directions.
Recall that ω₁, … , ωⱼ are actually columns of Q, which makes them also rows of Qᵗ. Without loss of generality, assume ωₓ is the xᵗʰ row of Qᵗ. Since y = Qᵗu
(defined in equation 15,) we can infer that yₓ = u ⋅ ωₓ
. Hence, expression 1 can be rewritten as the following:
Now that y₁ = ... = yⱼ⁻₁ = 0,
we can modify equation 17:
Combine equation 15 and 18, we have:
The maximal value is achieved when u is the unit eigenvector of C with eigenvalue λⱼ. Accordingly, the inductive case P(j) is correct. The proof is hence complete.
In this article, we have proved that why eigenvectors with large eigenvalues always correspond to dimensions with large variance. This discovery lays the foundation for the PCA, which comes in handy in facial recognization when we need to quickly reduce dimensions of pictures while preserving most of their variance. In the next article, I will apply PCA to an example of facial recognition. I will also talk about how to project photos onto eigenface subspace. Stay tuned!
[1] Peter Olver and Chehrzad Shakiban. Applied Linear Algebra. Springer, 2018.
[2] Matthew Turk and Alex Pentland. Eigenfaces for Recognition. pages 71–86. Journal of Cognitive Neuroscience, 1991.
[3] Cover photo credited to Mahesh Kumar at https://www.maheshkumar.xyz/article/principal-component-analysis.html.