How to Use Linear Discriminant Analysis for Dimensionality Reduction

Practical Guide to Dimensionality Reduction With Implementation in Python.

Kamil Polak
Jan 22 · 5 min read
How To Use Linear Discriminant Analysis For Dimensionality Reduction
How To Use Linear Discriminant Analysis For Dimensionality Reduction
Source: James, Gareth, et al. An introduction to statistical learning

In machine learning sometimes we have too many features on which the final classification is done. The higher the number of features, the harder it gets to work on it. Sometimes, some of these features are correlated, and hence redundant.

Dimensionality reduction, which extracts a small number of features by removing irrelevant, redundant, and noisy information, can be an effective solution.

The commonly used dimensionality reduction methods include supervised approaches such as linear discriminant analysis (LDA) and, unsupervised ones such as principal component analysis (PCA).

In this article, we will focus on LDA. Specifically, I will demonstrate how to use linear discriminant analysis for dimensionality reduction in Python.

Apart from that, I will give you an introduction to the methodology of LDA and, describe how LDA works. Finally, I will explain what is the difference between LDA and PCA.

What is the Linear Discriminant Analysis?

Linear discriminant analysis (LDA) is a classical statistical approach for supervised dimensionality reduction and classification. LDA computes an optimal transformation (projection) by minimizing the within-class distance and maximizing the between-class distance simultaneously, thus achieving maximum class discrimination (Jieping, Y., 2010)

The optimal transformation in LDA can be readily computed by applying an eigendecomposition on the so-called scatter matrices. Linear discriminant analysis has been used widely in many applications involving high-dimensional data.

LDA can increase computational efficiency and reduce the degree of
over-fitting resulting from the curse of dimensionality. The curse of dimensionality means that the error increases with the increase in the number of features.

Linear Discriminant Analysis was proposed in 1936 by Ronald A. Fisher who formulated Fisher’s Linear Discriminant for two-class classification problems. That’s why it is sometimes called Fisher’s LDA.

In 1948 this concept was further extended to multi-class problems by C. Radhakrishna Rao under the assumption of equal class covariances and normally distributed classes.

The figure below demonstrates the general concept of LDA for a two-class problem.

Linear_discriminant_analysis.JPG
Linear_discriminant_analysis.JPG
Image from: (Raschka, S., Vahid M.). Python Machine Learning, 3rd Ed

How does linear discriminant analysis work?

Linear Discriminant Analysis separates the samples in the training dataset by their class value. To be more precise, the LDA is trying to find a linear combination of input variables that achieves the maximum separation for samples between.

The full procedure can be described in four steps:

  1. Standardize the d -dimensional dataset, where d is the number of features.
  2. Calculate the separability between different classes. This is also known as between-class variance and is defined as the distance between the mean of different classes.
Image for post
Image for post
Between Class Variance

3. Calculate the within-class variance. This is the distance between the mean and the sample of every class.

linear discriminant analysis
linear discriminant analysis
Within-Class Variance

4. Construct the lower-dimensional space that maximizes between-class variance and minimizes within-class variance. In the equation below P is the lower-dimensional space projection. This is also called Fisher’s criterion.

linear discriminant analysis with python.png
linear discriminant analysis with python.png
Fisher’s Criterion

How To Make Projection With LDA?

LDA estimates the probability that a new set of inputs belongs to every class. The output class is chosen based on the highest probability.

Although, there are many ways to frame and solve LDA the most common approach is to use Bayes’ Theorem.

What is the difference between LDA and PCA?

Although, both LDA and PCA are linear transformation techniques that can be used to reduce the number of dimensions in a dataset there is one fundamental difference.

PCA is an unsupervised algorithm, i.e. that it ignores class labels altogether and aims to find the principal components that maximize variance in a given set of data.

On the other hand, Linear Discriminant Analysis can be considered as a supervised algorithm. It computes the directions (“linear discriminants”) that will represent the axes that maximize the separation between multiple classes.

Linear Discriminant Analysis in Python

To demonstrate how to implement LDA in Python I will use scikit-learn Python machine learning library and the LinearDiscriminantAnalysis class.

For this tutorial I will use the Wine Dataset. This dataset is public available here : UCI Machine Learning Repository.

UCI Notes About the Dataset:

  • The classes are ordered and not balanced (e.g. there ismunch more normal wines than excellent or poor ones).
  • Outlier detection algorithms could be used to detect the few excellent or poor wines.
  • Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.

Note: for simplicity, we are not going to deep dive into data preprocessing.

Let’s first import libraries and read the dataset.

Next, we will define the dependent and independent variables and split the dataset into train and test sets.

The next step is to standardize our dataset.

Let’s apply Linear Discriminant Analysis to our data.

We will train a simple logistic regression model.

We will make a confusion matrix to evaluate our model.

Now we can visualize the training set results

Let’s do the same for test results.

Conclusion

Linear Discriminant Analysis can be used as a powerful tool for dimension reduction to reduce the degree of over-fitting.

In this article, I demonstrated how to use Linear Discriminant Analysis for dimensionality reduction in Python. Specifically, I described:

  • What is the Linear Discriminant Analysis and how it works,
  • how to project with LDA,
  • what is the difference between LDA and PCA
  • how to implement Linear Discriminant Analysis with Python.

References

  • Rao, C. Radhakrishna. “The utilization of multiple measurements in problems of biological classification.” Journal of the Royal Statistical Society. Series B (Methodological) 10.2 (1948): 159–203.
  • Fisher, Ronald A. “The use of multiple measurements in taxonomic problems.” Annals of eugenics 7.2 (1936): 179–188.
  • Raschka, Sebastian, and Vahid Mirjalili. Python Machine Learning, 3rd Ed
  • Ye, Jieping, and Shuiwang Ji. “Discriminant analysis for dimensionality reduction: An overview of recent developments.” Biometrics: Theory, Methods, and Applications. Wiley-IEEE Press, New York (2010).
  • James, Gareth, et al. An introduction to statistical learning. Vol. 112. New York: springer, 2013.

The Startup

Medium's largest active publication, followed by +775K people. Follow to join our community.

Kamil Polak

Written by

Model Risk Manager @Nordea, Machine Learning Consultant, Connect: https://www.linkedin.com/in/kamil-polak/

The Startup

Medium's largest active publication, followed by +775K people. Follow to join our community.

Kamil Polak

Written by

Model Risk Manager @Nordea, Machine Learning Consultant, Connect: https://www.linkedin.com/in/kamil-polak/

The Startup

Medium's largest active publication, followed by +775K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store