Linear Discriminant Analysis as Dimensionality Reduction Technique

Grace Valmadrid
3 min readFeb 4, 2020

--

Linear Discriminant Analysis (LDA) is a classifier developed by Ronald A. Fisher in 1936. It was originally for a 2-class problem and was later generalised to the multi-class problem by C. Radhakrishna Rao in 1948. Good news is… It can also be used as dimensionality reduction technique in the preprocessing step for machine learning.

What is dimensionality reduction? Dimensionality reduction is simply the process of reducing the dimension of features. It can be done through either feature selection or extraction. Here are some of its benefits:

  • It is very useful in data visualisation if the dimension can be reduced to 3 or 2.
  • It reduces the time and storage required. (Imagine optimising a Support Vector Classification model on a huge dataset using GridSearchCV!)
  • It removes redundant or highly correlated features.
  • It helps overcome overfitting.

In general, it avoids the curse of dimensionality which refers to the problems (see 4 points above) arising from working with high-dimensional dataset.

What does LDA do?

LDA focuses on maximising the separability among known number of classes by projecting the features onto a new linear subspace which:

  1. Maximises the distance between the means of each class.
  2. Minimises the variation within each class.

Also, LDA makes the assumptions that the feature data is Gaussian.

Below are the steps in performing LDA:

  1. Turn the feature data into a matrix.
  2. Compute the mean vector for the different classes.
  3. Compute in-between-class (distance between the means) and within-class (distance between the mean and the data points) scatter matrices.
  4. Construct a lower dimensional space that maximises in-between-class variance and minimises the within-class variance.
  5. Project the original feature matrix onto this lower dimensional space.

I will illustrate this using a 2-dimensional data with 2 classes then reduce it to 1 dimension. If you would like to dig into the maths behind it, you may refer to Alan J. Izenman’s Modern Multivariate Statistical Techniques (chapter 8).

Here we have 2 classes, the red and green dots.

Let’s draw a diagonal line then project these data points onto this line. The green and red bars represent the mean of each class. Though this maximises the distance between the means, there are overlaps in the middle. Hence, this is not best solution. (Might be good for Principal Component Analysis [PCA].)

How about we try it this way? This new line satisfies the goals of LDA: maximises the distance between the means and minimises the variation within each class (no overlaps).

LDA vs PCA

These 2 techniques are very similar but also different. I have summarised them in a table below:

Further Reading

You may refer to this link for Scikit-Learn’s documentation on LDA, both as a classifier and dimensionality reduction technique.

Also check these out:

  • Extensions to LDA: Quadratic, Flexible and Regularised
  • Non-linear transformation techniques like Local Linear Embedding, Isometric Feature Mapping (Alan J. Izenman’s book has a chapter on this too)

There’s also a research on face recognition where PCA outperforms LDA.

--

--