With all the numerous Machine Learning tools available at our disposal, one of my favourite and most used algorithm is Principal Component Analysis or PCA. PCA is a commonly used algorithm in data science to compresses your data from a higher dimension to a lower dimension based on the eigenvector of the variance in your dataset.
For more quick information on inspiration and derivation, here is the link to the Wikipedia page: https://en.wikipedia.org/wiki/Principal_component_analysis
Now that you have had a quick refresher of the concepts and derivation, let me go over the common applications as well as the common mistakes that people make when using PCA.
Common Applications (or DO’s)
- Data Compression : This is one of the most common applications in machine learning. A lot of the time, we are dealing with Big Data. Big Data is the analysis of more data than the amount of memory you have available. In such cases, we often wish to decompose the features into a lower dimension, without a significant loss in variance. You don’t even need to run out of memory to use PCA. A lot of people (including myself) will use PCA if the computation time is too long.
- Visualization : Another common application is the visualization of higher dimensions. PCA allows you to see the distribution of data along the principal components (or eigenvectors). Usually, for this application, PCA reduction is done to either 2 or 3 dimensions. Personally, I heavily use PCA for visualization applications for unsupervised learning applications. One example of this is estimating the number of classes or clusters my higher dimension dataset could possibly have.
- Feature Selection : There is a type of PCA known as Functional PCA which is used in the field of Feature Selection. This application of PCA is slightly less well known and even I myself am very new to this.
Now that we have some of the fruitful applications of PCA, let’s explore some of the novice mistakes people make.
Common Mistakes (or DON’Ts)
- Fixing Overfit : One common mistake people make is using PCA to reduce overfit. Overfitting is usually caused by having too many features. The large amount of features results in the rise in cross validation error due to to high variance on our training data. People assume that reducing the number of dimensions will automatically reduce the influence of certain features and hence fix overfitting. PCA is simply reducing the number of dimensions of your original features and may not fix the issue of overfit. Better options would be feature selection and regularization.
- Set Standard : Another common mistake people make is that we HAVE TO USE PCA for every machine learning application. This is a false assumption. PCA should only be used if memory or computation speed becomes an issue. Otherwise, we are completely fine without it.
I hope you enjoyed this article. Feel free to comment your suggestions and remarks.