(Linear Discriminant Analysis) using Python

Sambit Mahapatra
Journey 2 Artificial Intelligence
5 min readFeb 20, 2018

--

Linear Discriminant Analysis (LDA) is a simple yet powerful linear transformation or dimensionality reduction technique. Here, we are going to unravel the black box hidden behind the name LDA. The general LDA approach is very similar to a Principal Component Analysis. But in addition to finding the component axes that maximize the variance of our data (PCA), we are additionally interested in the axes that maximize the separation between multiple classes (LDA). For PCA implementation using python please refer to the link — https://medium.com/journey-2-artificial-intelligence/unraveling-pca-principal-component-analysis-in-python-d23b081409cf

LDA is a supervised dimensionality reduction technique. The goal is to project a dataset onto a lower-dimensional space with good class-separability in order avoid overfitting (“curse of dimensionality”) and also reduce computational costs. Basically, the added advantage LDA gives over PCA is to tackle overfitting.

Steps for LDA :-

  1. Compute d-dimensional mean vectors for different classes from the dataset, where d is the dimension of feature space.
  2. Compute in-between class and with-in class scatter matrices.
  3. Compute eigen vectors and corresponding eigen values for the scatter matrices.
  4. Choose k eigen vectors corresponding to top k eigen values to form a transformation matrix of dimension d x k.
  5. Transform the d-dimensional feature space X to k-dimensional feature space X_lda via the transformation matrix.

Now, let’s build the PCA model from scratch. Source code is available in the github link —

The dataset used here is bank note authentication dataset publicly available in UCI machine learning repository.

https://archive.ics.uci.edu/ml/datasets/banknote+authentication#

The attributes present in the dataset are variance of Wavelet Transformed image (continuous), skewness of Wavelet Transformed image (continuous), curtosis of Wavelet Transformed image (continuous), entropy of image (continuous), class (integer) (0-not authentic, 1-authentic). Before starting the Linear Disriminant Analysis, first import all the required dependencies.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Now load the dataset into the Data Frame using read_csv function from Pandas.

columns = ["var","skewness","curtosis","entropy","class"]
df = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/00267/\
data_banknote_authentication.txt",index_col=False, names = columns)

The dataset contains total 1372 instances, out of which 762 are of non authentic notes and 610 of authentic notes. The data distribution of the attributes looks like using both univariate and multivariate plots :

f, ax = plt.subplots(1, 4, figsize=(10,3))
vis1 = sns.distplot(df["var"],bins=10, ax= ax[0])
vis2 = sns.distplot(df["skewness"],bins=10, ax=ax[1])
vis3 = sns.distplot(df["curtosis"],bins=10, ax= ax[2])
vis4 = sns.distplot(df["entropy"],bins=10, ax=ax[3])
f.savefig('subplot.png')
sns.pairplot(df, hue="class")

Now, we will compute the 4-dimensional mean vectors for both the classes (4 = number of features). Unlike PCA, standardization of the data is not needed in LDA as it doesn’t affect the output. The reason why there’s no effect of standardization on the main results in LDA is that LDA decomposes ratio of Between-to-Within covariances, and not the covariance itself having its magnitude (as PCA does).

mean_vec = []
for i in df["class"].unique():
mean_vec.append( np.array((df[df["class"]==i].mean()[:4]) ))
print(mean_vec)

The next step is to calculate the with-in class scatter matrices and in-between class scatter matrices.

SW = np.zeros((4,4))
for i in range(1,4): #2 is number of classes
per_class_sc_mat = np.zeros((4,4))
for j in range(df[df["class"]==i].shape[0]):
row, mv = df.loc[j][:4].reshape(4,1), mean_vec[i].reshape(4,1)
per_class_sc_mat += (row-mv).dot((row-mv).T)
SW += per_class_sc_mat
print('within-class Scatter Matrix:\n', SW)
overall_mean = np.array(df.drop("class", axis=1).mean())
SB = np.zeros((4,4))
for i in range(2): #2 is number of classes
n = df[df["class"]==i].shape[0]
mv = mean_vec[i].reshape(4,1)
overall_mean = overall_mean.reshape(4,1) # make column vector
SB += n * (mv - overall_mean).dot((mv - overall_mean).T)
print('between-class Scatter Matrix:\n', SB)

Next, we need to solve the generalized eigenvalue problem for the matrix inverse(SW).SB to obtain the linear discriminants.

e_vals, e_vecs = np.linalg.eig(np.linalg.inv(SW).dot(SB))print('Eigenvectors \n%s' %e_vecs)
print('\nEigenvalues \n%s' %e_vals)

Now, we need to select top-k eigen vectors corresponding to top-k eigen values. The selection of k depends upon the variance retention possible with each direction. For data compression purpose, we generally go for 99% variance retention, while for visualization we make the dimension to 2 or 3. Here, we till take top-2 eigen values corresponding eigen vectors for visualization purpose. But we will the eigen vector belongs to largest eigen value retains nearly 100% variance, so we can discard other 3 too. The transformation matrix W will be:

W = np.hstack((e_pairs[0][1].reshape(4,1), e_pairs[1][1].reshape(4,1)))
print('Matrix W:\n', W.real)

Now, we need to transform the 4-dimensional feature space X to 2-dimensional feature subspace X_lda.

X_lda = X.dot(W)
df["PC1"] = X_lda[:,0]
df["PC2"] = X_lda[:,1]

The data distribution with these two components now look like:

vis = sns.lmplot(data = df[["PC1","PC2","class"]], x = "PC1", y = "PC2",fit_reg=False, hue = "class",\
size = 6, aspect=1.5, scatter_kws = {'s':50}, )
vis.savefig("lda.png")
sns.pairplot(df[["PC1","PC2","class"]], hue="class")

From the above plot it can also be seen that, only component PC1 is enough to effectively differentiate two classes.

The LDA can also be directly applied using sci-kit learn library.

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
model = LDA(n_components=3)
X_lda = model.fit_transform(X, y)
df["PC1"] = X_lda[:,0]
sns.regplot(data = df[["PC1","class"]], x = "PC1",y = "class", fit_reg=False,scatter_kws = {'s':50}, )

Here, we can see as it was evident from the LDA implementation from sracth too that only one top component also able to differentiate the class with great accuracy.

For further readings:-

http://sebastianraschka.com/Articles/2014_python_lda.html

http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html

--

--

Sambit Mahapatra
Journey 2 Artificial Intelligence

Putting ML to Customer Support at CSAT.AI | Natural Language Processing | Full Stack Data Scientist (sambit9238@gmail.com)