# [ Archived Post ] A Tutorial on Principal Component Analysis

Please note that this post is for my own educational purpose.

Don’t know how many times I am reviewing this topic → my passion. (learning about PCA).

PCA → good first method to try for any industry. (and we are going to explore the math behind PCA). (the proofs are needed for a full understanding).

We are gathering data → from some kind of system → the data seems to be clouded and redundant → a lot of fields have this problem → , not just computer science.

We are going to capture the data → like above → interesting. (each camera have 2D scatter plot). (How can we reduce this dimension?). (in the real world there is noise and much more).

LA → change of basis → axis → which axis contains the most variance? (we can rewrite the data collected like below).

That is our starting point. → we have multiple of those data points. (what is the orth basis → can be 1 and 1 or sqrt(2) and sqrt(2) and much more). (PCA → Linear → simplify the problem → and easier to optimize) → P and more. (Okay so we can change basis → but what basis should we choose?). (what features do we want the Y to have?).

We want to reduce noise → while keeping the signal.

The signal → large variance → while noise → smaller variance. (we assume that largest variance is the signal and we want to keep that).

The relationship between each axis → either can be high or low. (either can be correlated or not). (we could have recorded one variable).

Covariance → a linear relationship between two variables → LINEAR dependency. (not NON-LINEAR). (And we can generalize into the matrix, not just vectors) → covariance.

PCA → assumes that P is an orthonormal vector matrix. (now for the review and solution).

PCA → linearity → large variance → the data have a high signal to low ratio → a strong and not correct assumption.

The covariance of Y should be an Identity matrix.

EVD → solution right away.

Now SVD → does not need to compute the covariance matrix.

The SVD → visualized.

SVD → we can understand this as a change of basis for the left vector to right vector.

We can perform PCA → using SVD → just cut off the eigenvalues.

Non-linear → might be a better choice → this depends on the data. (dimensional reduction → is a very powerful and strong method).

Limits of PCA → the orthogonality → might be limiting the power of PCA. (second order dependency might not be the right selection). (kernel methods exist).

Reference