PCA, LDA and PLS exposed with Python

Part 2: LDA and PLS

Published in

Analytics Vidhya

3 min readMar 9, 2020

We have seen in the previous part that PCA helps us in reducing the dimensionality of our dataset. It also easily allow us to separate classes so it works as a clustering algorithm or unsupervised learning step of our project. Now we are interested in another package, LDA.

Let’s import the standard libraries for the task:

python libraries to be used in this section

Our dataframe is the same as the previous one:

To apply LDA, we need to distinguish our variables from our target. Yes, correct, this is no more unsupervised learning!

X = df.iloc[:,:3].copy() #Selecting the variables
target = ddf['Sex_male'].copy() #the target

As with most sklearn methods, we use fit_transform to our data:

LDA = LinearDiscriminantAnalysis(n_components=3) 
scaler = StandardScaler()
scaled_data = scaler.fit_transform(X)
data_projected = LDA.fit_transform(scaled_data,target)

The n_components key word in LDA gives us the projection to the n most discriminative directions in the dataset. We set this parameter to two to get a transformation in two dimensional space.
To understand what our model did, we apply fit_transform method of LDA to our data, and the target.

We can now plot our results:

# Plot the transformed data
markers = ['s','x']
colors = ['r','b']
fig = plt.figure(figsize=(8,6))
ax0 = fig.add_subplot(111)
for l,m,c in zip(np.unique(target),markers,colors):
    if l==1:
        lab = 'Male'
    else:
        lab= 'Female'
    ax0.scatter(data_projected[:,0][target==l],data_projected[:,0][target==l],c=c,marker=m, label=lab)
    
plt.title('LDA Height,Weight,Sex')
plt.legend(fontsize=14)

From the above graph we can see that we remove most of the superposition between the two classes. However some indetermination still remains.

There are a couple of points for each class which seems to be of the other category, not surprisingly as we generated a random dataset with strongly overlapped features!

What about PLS ?

PLS is another powerfull technique, expecially if we want to use it as regression tool.

X = df[['Weight','Height','Age']]
y = df.Sex_maleregr_pls = PLSRegression(n_components=3)
PLS_score=regr_pls.fit_transform(X,y)
PLS_loads=regr_pls.x_loadings_

Score plot for the dataset after PLS. The color rapresent the sex (1=Male)

loadings plot for the dataset after PLS.

The difference between the PCA and PLS is that PCA rotates the axis in order to maximize the variance of the variable. PLS rotates them in order to maximize the output of the target.

All of those 3 methods could be employed in this dataset to better identify the class of the person, given Age, Weight and Height.

PCA, LDA and PLS exposed with Python

Part 2: LDA and PLS

Written by Andrea Castiglioni