# [ Archived Post ] Principal Component Analysis from Maximization Point of View

*Please note that this post is for my self-learning process.*

**Simple Review About PCA**

For a given matrix X with dimension (d*n), PCA aims to find a projection vector U that have dimension (d,1). When the original data is projected to a linear subspace the variance is maximized. We then can write a function f(u) (since U is an unknown variable) to formulate our objective function as seen above. (And since dot(U^T, X) is a vector we can safely say that we are aiming to maximize the variance of the projected vector.)

Taking a simple side note for calculating the variance, for a given vector V with dimension (n) we can calculate the standard deviation from the formula below. (Hence the variance would be standard deviation ^ 2).

In the case where V is a matrix with dimension (n*m) we can obtain the co-variance matrix as the equation shown below.

One thing to note above is the fact that they are dividing by n, but I am going to get the sample co-variance matrix hence divide by (n-1). Additionally, we also can create a co-variance matrix **respect to n dimension, where V = dot(X,X^T)/(m-1), and this is what I am going to calculate. (this method is similar of creating a gram matrix.)**

Now lets think about the case when we scale the original vector or matrix and how that effects the variance. When we scale the vector by some value A the overall variance gets scaled up by factor of A².

Lets first view the variance of the original data, as well as the co-variance matrix, and now we are going to choose a scaling factor A. (Below I have picked 3).

In the case where we have chosen 3 we can directly see that the variance have increased by 9 as well as every element in the co-variance matrix. Now lets think about the case where we wish to perform a dot product from our original vector and matrix. Since dot product of two vectors becomes a scalar value there is no point of capturing the variance, but since dot product between vector and matrix is another vector we can calculate the variance.

And as seen above, we can see that when we perform dot product to a matrix, we can see that the change of the variance is equivalent when we have directly calculated performed the equation

*dot(scaled_vec^T, dot(Sigma,scaled_vec))*

*dot(scaled_vec^T, dot(Sigma,scaled_vec))*

Where Sigma is the co-variance matrix respect to the original data, but please do take note that the dimension that the co-variance matrix is respected to. Now lets get back to our PCA.

Going back we have formulated our problem as the function seen in equation 1. So it simply is an optimization problem, however the problem is that the equation seen above have no bound, we can select any vector U and multiply it by 1000 to get a larger vector U that maximize the function even more. Hence we introduce another constraint to the function and make this Lagrange Dual problem. Where we make constraint the length of the vector U to be one. (equation 2). (The red indicates the dual variable as well as the constraint.).

Now everything is done, we set the derivative to be zero and take the derivative respect to vector U, (equation 3). And finally we arrive the famous eigenvalue decomposition equation. And substituting the values we got to the original equation we arrive at equation 4, hence finding out that the solution to the PCA is the largest eigenvalues.

**Final Words**

I have linked two of the best PCA lectures that I saw, please feel free to share them with other people as well.

If any errors are found, please email me at jae.duk.seo@gmail.com, if you wish to see the list of all of my writing please view my website here.

**Reference**

- Covariance Matrix . (2018). Stattrek.com. Retrieved 30 October 2018, from https://stattrek.com/matrix-algebra/covariance-matrix.aspx
- Standard Deviation and Variance. (2018). Mathsisfun.com. Retrieved 30 October 2018, from https://www.mathsisfun.com/data/standard-deviation.html
- implemented?, H. (2018). How numpy.cov() function is implemented?. Stack Overflow. Retrieved 30 October 2018, from https://stackoverflow.com/questions/27448352/how-numpy-cov-function-is-implemented
- numpy.var — NumPy v1.13 Manual. (2018). Docs.scipy.org. Retrieved 30 October 2018, from https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.var.html
- Principal Component Analysis. (2018). YouTube. Retrieved 30 October 2018, from https://www.youtube.com/watch?v=glc3qs5Ym7g&t=606s
- Ali Ghodsi, Lec 1: Principal Component Analysis. (2018). YouTube. Retrieved 30 October 2018, from https://www.youtube.com/watch?v=L-pQtGm3VS8&list=PLehuLRPyt1HzQoXEhtNuYTmd0aNQvtyAK