Dimensionality Reduction on Face using PCA

Published in

CodeX

7 min readOct 23, 2021

Machine Learning has a wide variety of dimensionality reduction techniques. It is one of the most important aspects in the Data Science field. As a result, in this article, I will present one of the most significant dimensionality reduction techniques used today, known as Principal Component Analysis (PCA).

But first, we need to understand what Dimensionality Reduction is and why it is so crucial.

Dimensionality Reduction

Dimensionality reduction, also known as dimension reduction, is the transformation of data from a high-dimensional space to a low-dimensional space in such a way that the low-dimensional representation retains some meaningful properties of the original data, preferably close to its underlying dimension.

Why it is useful?

Working with high-dimensional spaces may be inconvenient for a variety of reasons, including the fact that raw data is often vague as a result of the curse of dimensionality, and processing the data is typically computationally expensive.
Dimensionality reduction is popular in domains such as signal processing, voice recognition, neuroinformatics, and bioinformatics that deal with huge numbers of observations and/or variables.

Principal Component Analysis

A “best fitting” line in two, three, or higher dimensions can be defined as one that minimizes the average squared perpendicular distance from a point to the line. The second best-fitting line can be picked in the same way from perpendicular directions to the first. If this seems gibberish to you don’t worry we will see these things in detail in this post.

Green points are actual points, whereas blue points are projected points.
To begin, PCA determines the direction (vector u⁽¹⁾ ∈ ℝⁿ) onto which to project data in order to minimise projection error. The distance is depicted in pink lines in the above figure. PCA selects a line in the direction with the smallest squared error, which is determined by the distance between the selected line and a point. We can also conclude from the preceding figure that the orange line is the worst line since the distance between the points and the line is large. Here, negative direction can also be picked but here both positive and negative direction end up in a same line.

In general, Reduce from n-dimension to k-dimension: Find k vectors u⁽¹⁾, u⁽²⁾,….,u⁽ᵐ⁾ onto which to project the data, so as to minimize the projection error.

To help you understand, we will first experiment with an example 2D dataset to get intuition on how PCA works, and then use it on a bigger dataset.

Let us load our sample dataset and visualize it super quick:

mat3 = loadmat('ex7data1.mat')
X3 = mat3['X']plt.scatter(X3[:, 0], X3[:, 1], alpha=0.5)

Before using our data we need to normalize it. I am not going deep into normalization in this post. To know more about normalization click here.

def featureNormalize(X):
    
    mu = np.mean(X, axis=0)
    sigma = np.std(X, axis=0)
    X_norm = (X - mu)/sigma
    
    return X_norm, mu, sigma

PCA Algorithm

1. Compute the covariance matrix

The measure of the relation between two variables is covariance. It quantifies how much two variables vary from their mean values. It can help us understand how the two variables are related.

The covariance matrix is represented by the Greek letter (sigma) in the above equation, which is different from summation. By multiplying x⁽ⁱ⁾ and x⁽ⁱ⁾ transpose, we may obtain the covariance matrix.

2. Compute eigen vectors

Explaining more about eigenvalues and eigenvectors would make this article too long, I’ll just use numpy’s svd() method. However, for your convenience, I’ll leave the process for calculating eigenvalues and eigenvectors at the conclusion of the article. Simply, eigenvalues and eigenvectors will help us find the direction of PCA line to be projected. Click here to know more about eigenvalues and eigenvectors. Here U is collection of vectors u⁽¹⁾, u⁽²⁾,….,u⁽ᵐ⁾ onto which to project the data.

After using svd(), we get U as the above matrix. Returned U matrix consists directions for n-dimensions, we need to select k to reduce n-dimension matrix to k-dimension. k-dimension matrix is called U_reduce.

We can get Z which is reduced dimension by multiplying U_reduce transpose with X, which reduce x⁽ⁱ⁾ of size (n x 1) to (k x 1).

We can get back our original data approximately (X≈Xₐₚₚᵣₒₓ = U_reduce ⋅ Z) by multiplying U_reduce with Z. Original data and Approximated data will look like the following figure:

def PCA(X):
    m, n = X.shape
    cov_matrix = 1/m * X.T @ X
    U, S, V = np.linalg.svd(cov_matrix)
    return U, S, VX_norm, mu, sigma = featureNormalize(X3)
U, S, V = PCA(X_norm)

plt.scatter(X3[:, 0], X3[:, 1], alpha=0.5)
plt.plot([mu[0], (mu + 1.5 * S[0] * U[:, 0].T)[0]], [mu[1], (mu + 1.5 * S[0] * U[:, 0].T)[1]], color="black", linewidth=3)
plt.plot([mu[0], (mu + 1.5 * S[1] * U[:, 1].T)[0]], [mu[1], (mu + 1.5 * S[1] * U[:, 1].T)[1]], color="black", linewidth=3)
plt.xlim(-1,7)
plt.ylim(2,8)

def projectData(X, U, K):
    
    m = X.shape[0]
    Z = np.zeros((m, K))
    for i in range(m):
        for j in range(K):
            projection_k = X[i, :] @ U[:, j] #x @ U_reduce
            Z[i, j] = projection_k
    return ZK=1
Z = projectData(X_norm, U, K)def recoverData(Z, U, K):
    z = Z.shape[0]
    u = U.shape[0]
    X_rec = np.zeros((z, u))
    
    for i in range(z):
        for j in range(u):
            X_rec[i, j] = Z[i, :] @ U[j, :K]
    return X_recX_rec  = recoverData(Z, U, K)plt.scatter(X_norm[:,0],X_norm[:,1],marker="o",label="Original",facecolors="none",edgecolors="b",s=15)
plt.scatter(X_rec[:,0],X_rec[:,1],marker="o",label="Approximation",facecolors="none",edgecolors="r",s=15)
plt.title("The Normalized and Projected Data after PCA")
plt.legend()

Choosing k (number of principal components)

We can choose k by the following:

95 to 99 % of variance retained is good and acceptable.

Applying PCA

Mapping of x⁽ⁱ⁾ →z⁽ⁱ⁾ should be defined by running PCA only on training set.
Do not use PCA to reduce the no.of.features to k. This might work ok, but isn’t a good way to address overfitting.
Before PCA implementation, first try running whatever you want to do with raw data. Only if it doesn’t do what you want, then implement PCA.

It’s a lot of algebra and mathematics, let us dive into our dimensionality reduction on faces using PCA.

Let us load and visualize our dataset quickly:

mat4 = loadmat('ex7faces.mat')
X4 = mat4['X']fig, ax = plt.subplots(nrows=10,ncols=10,figsize=(8,8))
for i in range(0,100,10):
    for j in range(10):
        ax[int(i/10),j].imshow(X4[i+j,:].reshape(32,32,order="F"),cmap="gray")
        ax[int(i/10),j].axis("off")

Lets apply PCA on this dataset and visualize it:

X_norm2, mu2, sigma2 = featureNormalize(X4)
U2, S2, V2 = PCA(X_norm2)
U_reduced = U2[:,:36].T
fig2, ax2 = plt.subplots(6,6,figsize=(8,8))
for i in range(0,36,6):
    for j in range(6):
        ax2[int(i/6),j].imshow(U_reduced[i+j,:].reshape(32,32,order="F"),cmap="gray")
        ax2[int(i/6),j].axis("off")

Let’s reduce dimension from 1024 to 100 on faces:

K2 = 100
Z2 = projectData(X_norm2, U2, K2)
print("The projected data Z has a size of:",Z2.shape)
X_rec2  = recoverData(Z2, U2, K2)
fig3, ax3 = plt.subplots(10,10,figsize=(8,8))
for i in range(0,100,10):
    for j in range(10):
        ax3[int(i/10),j].imshow(X_rec2[i+j,:].reshape(32,32,order="F"),cmap="gray")
        ax3[int(i/10),j].axis("off")

Additional notes on Eigenvalues and Eigenvectors

Procedure to find eigenvalues and eigenvectors:
1. Find the characteristic equation.
2. Solve the characteristic equation to get characteristic roots. They are otherwise known as Eigenvalues or Latent roots.
3. To find Eigenvectors, solve (A-λI)X= 0 for different values of λ .

Conclusion

Today, we saw under the hood of PCA and how it actually works. Then it was created from scratch using python’s numpy, pandas and matplotlib. It is more useful in many application which helps to reduces the dimensions of the data with very low data loss. The dataset and final code is uploaded in Github.

Check it out here PCA.

If you like this post, then check out my previous posts in this series about

1. What is Machine Learning?

2. What are the Types of Machine Learning?

3. Uni-Variate Linear Regression

4. Multi-Variate Linear Regression

5. Logistic Regression

6. What are Neural Networks?

7. Digit Classifier using Neural Networks

8. Image Compression with K-means clustering

9. Detect Failing Servers on a Network using Anomaly Detection

Last Thing

If you enjoyed my article, a clap 👏 and a follow would be 🤘unifying🤘 and it is helpful for Medium to promote this article so that others can read it. I am Jagajith and I will catch you in the next one.