Don’t know how many times I am reviewing this topic → my passion. (learning about PCA).
PCA → good first method to try for any industry. (and we are going to explore the math behind PCA). (the proofs are needed for a full understanding).
We are gathering data → from some kind of system → the data seems to be clouded and redundant → a lot of fields have this problem → , not just computer science.
We are going to capture the data → like above → interesting. (each camera have 2D scatter plot). (How can we reduce this dimension?). (in the real world there is noise and much more).
LA → change of basis → axis → which axis contains the most variance? (we can rewrite the data collected like below).
That is our starting point. → we have multiple of those data points. (what is the orth basis → can be 1 and 1 or sqrt(2) and sqrt(2) and much more). (PCA → Linear → simplify the problem → and easier to optimize) → P and more. (Okay so we can change basis → but what basis should we choose?). (what features do we want the Y to have?).
We want to reduce noise → while keeping the signal.
The signal → large variance → while noise → smaller variance. (we assume that largest variance is the signal and we want to keep that).
The relationship between each axis → either can be high or low. (either can be correlated or not). (we could have recorded one variable).
Covariance → a linear relationship between two variables → LINEAR dependency. (not NON-LINEAR). (And we can generalize into the matrix, not just vectors) → covariance.
PCA → assumes that P is an orthonormal vector matrix. (now for the review and solution).
PCA → linearity → large variance → the data have a high signal to low ratio → a strong and not correct assumption.
The covariance of Y should be an Identity matrix.
EVD → solution right away.
Now SVD → does not need to compute the covariance matrix.
The SVD → visualized.
SVD → we can understand this as a change of basis for the left vector to right vector.
We can perform PCA → using SVD → just cut off the eigenvalues.
Non-linear → might be a better choice → this depends on the data. (dimensional reduction → is a very powerful and strong method).
Limits of PCA → the orthogonality → might be limiting the power of PCA. (second order dependency might not be the right selection). (kernel methods exist).