HowTo-AzureML-Use R’s PCA in Azure

Why would one want to do PCA in first place — If given lot of independent variables, PCA technique helps us to find out which ones matter the most. In process one would like to drop some features/variables without affecting the results too much. A good reference to understand is http://www.cs.princeton.edu/picasso/mats/PCA-Tutorial-Intuition_jp.pdf

PCA is done in R using princomp(), prcomp() or pca() for generally exploratory analysis. Usually PCA can be done by eigenvalue decomposition of a data covariance matrix or singular value decomposition of a data matrix. We will use prcomp although princomp does provide lot more information.

For convenience I have chosen a dataset with just 5 features, absolutely not the right thing. One of the important step to achieve PCA is to actually scale the data and in case something becomes NaN — make sure they are back to Zero.

R script embedded in the code window for PCA

Notice how one can use pretty much everything as in normal R world. All the outputs of summary/print or plot go to R device output port (click on visualize gets following).

Information from top is clipped but you get the idea. Here one can see only 4 features are important.

Email me when Govind Kanshi publishes or recommends stories