How to Use Linear Discriminant Analysis for Dimensionality Reduction
Practical Guide to Dimensionality Reduction With Implementation in Python.
In machine learning sometimes we have too many features on which the final classification is done. The higher the number of features, the harder it gets to work on it. Sometimes, some of these features are correlated, and hence redundant.
Dimensionality reduction, which extracts a small number of features by removing irrelevant, redundant, and noisy information, can be an effective solution.
The commonly used dimensionality reduction methods include supervised approaches such as linear discriminant analysis (LDA) and, unsupervised ones such as principal component analysis (PCA).
In this article, we will focus on LDA. Specifically, I will demonstrate how to use linear discriminant analysis for dimensionality reduction in Python.
Apart from that, I will give you an introduction to the methodology of LDA and, describe how LDA works. Finally, I will explain what is the difference between LDA and PCA.
What is the Linear Discriminant Analysis?
Linear discriminant analysis (LDA) is a classical statistical approach for supervised dimensionality reduction and classification. LDA computes an optimal transformation (projection) by minimizing the within-class distance and maximizing the between-class distance simultaneously, thus achieving maximum class discrimination (Jieping, Y., 2010)
The optimal transformation in LDA can be readily computed by applying an eigendecomposition on the so-called scatter matrices. Linear discriminant analysis has been used widely in many applications involving high-dimensional data.
LDA can increase computational efficiency and reduce the degree of
over-fitting resulting from the curse of dimensionality. The curse of dimensionality means that the error increases with the increase in the number of features.
Linear Discriminant Analysis was proposed in 1936 by Ronald A. Fisher who formulated Fisher’s Linear Discriminant for two-class classification problems. That’s why it is sometimes called Fisher’s LDA.
In 1948 this concept was further extended to multi-class problems by C. Radhakrishna Rao under the assumption of equal class covariances and normally distributed classes.
The figure below demonstrates the general concept of LDA for a two-class problem.
How does linear discriminant analysis work?
Linear Discriminant Analysis separates the samples in the training dataset by their class value. To be more precise, the LDA is trying to find a linear combination of input variables that achieves the maximum separation for samples between.
The full procedure can be described in four steps:
- Standardize the d -dimensional dataset, where d is the number of features.
- Calculate the separability between different classes. This is also known as between-class variance and is defined as the distance between the mean of different classes.
3. Calculate the within-class variance. This is the distance between the mean and the sample of every class.
4. Construct the lower-dimensional space that maximizes between-class variance and minimizes within-class variance. In the equation below P is the lower-dimensional space projection. This is also called Fisher’s criterion.
How To Make Projection With LDA?
LDA estimates the probability that a new set of inputs belongs to every class. The output class is chosen based on the highest probability.
Although, there are many ways to frame and solve LDA the most common approach is to use Bayes’ Theorem.
What is the difference between LDA and PCA?
Although, both LDA and PCA are linear transformation techniques that can be used to reduce the number of dimensions in a dataset there is one fundamental difference.
PCA is an unsupervised algorithm, i.e. that it ignores class labels altogether and aims to find the principal components that maximize variance in a given set of data.
On the other hand, Linear Discriminant Analysis can be considered as a supervised algorithm. It computes the directions (“linear discriminants”) that will represent the axes that maximize the separation between multiple classes.
Linear Discriminant Analysis in Python
To demonstrate how to implement LDA in Python I will use scikit-learn Python machine learning library and the LinearDiscriminantAnalysis class.
For this tutorial I will use the Wine Dataset. This dataset is public available here : UCI Machine Learning Repository.
UCI Notes About the Dataset:
- The classes are ordered and not balanced (e.g. there ismunch more normal wines than excellent or poor ones).
- Outlier detection algorithms could be used to detect the few excellent or poor wines.
- Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.
Note: for simplicity, we are not going to deep dive into data preprocessing.
Let’s first import libraries and read the dataset.
Next, we will define the dependent and independent variables and split the dataset into train and test sets.
The next step is to standardize our dataset.
Let’s apply Linear Discriminant Analysis to our data.
We will train a simple logistic regression model.
We will make a confusion matrix to evaluate our model.
Now we can visualize the training set results
Let’s do the same for test results.
Linear Discriminant Analysis can be used as a powerful tool for dimension reduction to reduce the degree of over-fitting.
In this article, I demonstrated how to use Linear Discriminant Analysis for dimensionality reduction in Python. Specifically, I described:
- What is the Linear Discriminant Analysis and how it works,
- how to project with LDA,
- what is the difference between LDA and PCA
- how to implement Linear Discriminant Analysis with Python.
- Rao, C. Radhakrishna. “The utilization of multiple measurements in problems of biological classification.” Journal of the Royal Statistical Society. Series B (Methodological) 10.2 (1948): 159–203.
- Fisher, Ronald A. “The use of multiple measurements in taxonomic problems.” Annals of eugenics 7.2 (1936): 179–188.
- Raschka, Sebastian, and Vahid Mirjalili. Python Machine Learning, 3rd Ed
- Ye, Jieping, and Shuiwang Ji. “Discriminant analysis for dimensionality reduction: An overview of recent developments.” Biometrics: Theory, Methods, and Applications. Wiley-IEEE Press, New York (2010).
- James, Gareth, et al. An introduction to statistical learning. Vol. 112. New York: springer, 2013.