PCA, LDA and PLS exposed with Python

Part 2: LDA and PLS

Andrea Castiglioni
Analytics Vidhya
3 min readMar 9, 2020

--

We have seen in the previous part that PCA helps us in reducing the dimensionality of our dataset. It also easily allow us to separate classes so it works as a clustering algorithm or unsupervised learning step of our project. Now we are interested in another package, LDA.

Let’s import the standard libraries for the task:

python libraries to be used in this section

Our dataframe is the same as the previous one:

dataframe head of our dataset.

To apply LDA, we need to distinguish our variables from our target. Yes, correct, this is no more unsupervised learning!

As with most sklearn methods, we use fit_transform to our data:

The n_components key word in LDA gives us the projection to the n most discriminative directions in the dataset. We set this parameter to two to get a transformation in two dimensional space.
To understand what our model did, we apply fit_transform method of LDA to our data, and the target.

We can now plot our results:

LDA of our dataset

From the above graph we can see that we remove most of the superposition between the two classes. However some indetermination still remains.

There are a couple of points for each class which seems to be of the other category, not surprisingly as we generated a random dataset with strongly overlapped features!

What about PLS ?

PLS is another powerfull technique, expecially if we want to use it as regression tool.

Score plot for the dataset after PLS. The color rapresent the sex (1=Male)
loadings plot for the dataset after PLS.

The difference between the PCA and PLS is that PCA rotates the axis in order to maximize the variance of the variable. PLS rotates them in order to maximize the output of the target.

All of those 3 methods could be employed in this dataset to better identify the class of the person, given Age, Weight and Height.

--

--