Understanding Mathematics behind Unsupervised Linear Dimensionality Reduction Techniques -PCA, SVD & Incremental PCA
Let us understand dimensionality reduction with an example.
Imagine a student(let me call him Mr. Lazypants) getting an entire semester to study a course having 30 chapters. Mr. Lazypants is a procrastinator and keeps delaying his lessons(skips lectures and tutorials) and eventually ends up having just 2 weeks(14 days) before the final exam and he needs to clear his exam no matter what.
He clearly understands his situation where he can’t read all the 30 chapters, so he starts by focusing on those chapters that are most important for the exam and schedules one chapter a day. This means he can complete the 13 most important chapters needed to pass with good marks(you know everyone needs a day before exam to revise things supposedly lol :) ). He sticks to the routine and completes target and gets pretty good grades in the examination.
Mr. Lazypants implicitly applied ‘dimensionality reduction’ here by removing the redundancy in the topics that were not important at all. He reduced the number of features (30 chapters) to the most important components that contained maximum information needed to pass the exam (i.e., those 13 chapters were equivalent to studying all the 30 chapters entirely)
Was it a fair decision making from Mr. Lazypants ??
Well! I will agree to disagree on ethical terms but if you think practically its worth it. Even if he had studied all the 30 chapters, there could be a possibility of scoring less marks by retaining less than what he actually studied (due to constraints of human mind in data processing and transformations). So he knew the art of selecting the important features from the data.
We will precisely look at some Linear Dimensionality reduction techniques and try to understand the mathematics behind them and see how they relate to the parameters and attributes present in the ‘sklearn’ library. We will look at the following 3 dimensionality reduction techniques:-
- PCA
- SVD
- Incremental PCA
So, let’s start looking at the components one by one. We start with PCA



We will pause our discussion about PCA a bit and will resume it after discussing about SVD. Before that we will try to understand a rather very important theorem that forms the base for many decomposition algorithms.





After having studied SVD and one of its variant-Randomized SVD, we will look forward to implement them on python. We will try to implement them both using Scipy and sklearn



We now demonstrate how the PCA algorithm is implemented in Sklearn. It uses SVD for linear dimensionality reduction.




We will now look forward towards the last algorithm of this blog — Incremental PCA.



References:-
- https://arxiv.org/pdf/1810.06860 (for randomized-SVD)
- https://www.cs.technion.ac.il/~mic/doc/skl-ip.pdf (for Incremental PCA)
- https://scikit-learn.org/stable/modules/decomposition.html
- https://en.wikipedia.org/wiki/QR_decomposition (QR Decomposition)
- https://towardsdatascience.com/the-mathematics-behind-principal-component-analysis-fff2d7f4b643 (for PCA understanding)
- Schaum Series Linear Algebra (for Eigen Decomposition and SVD)
PS:- My aim was to bring clarity to the concept by incorporating as much derivations as possible because it helped me a lot in understanding the source code. Any comments, improvements and suggestions are always welcome. I am pretty old-school and tend to write things on paper before applying them anywhere so pardon me if the text content was less and images were more.
