This article will try to:
- Explain linear discriminant analysis from very basics
- Try to illustrate it further through hands-on practice
What you are already supposed to know:
- A notebook and a pen
Linear discriminant analysis is a classification algorithm which uses Bayes’ theorem to calculate the probability of a particular observation to fall into a labeled class. It has an advantage over logistic regression as it can be used in multi-class classification problems and is relatively stable when the classes are highly separable.
Although you can easily implement the algorithm of LDA through various software packages like R or Sci-kit learn (Python) but it is equally important to know what goes actually in the background when you use these software tools. To understand the logic and mathematics behind it, what I personally believe is that hands-on practice proves always effective. This article will show you different calculations and derivations involved in LDA and it is recommended for you to use pen and paper to replicate and recalculate everything.
Before we proceed further we need to understand the assumptions behind LDA, which are as mentioned below:
- The distribution of a variable in every class is normal (gaussian distribution)
- The variance of a variable in every class is equal
This may sound a little over the head but I will try to make it more simple and clear throughout the article. LDA was meant to solve the multi-class classification problems but here we will consider a two-class classification problem with a single predictor variable for simplicity. Consider a very simple example of predicting the gender of a person through his/her height by the data as shown below:
The above data set will be used to develop a model through LDA and before doing that we will check it for the assumptions of LDA to see if this particular algorithm can be implemented. We have to first make a frequency table out of it with height column represented in class intervals like as shown below:
You see, the frequency of every class interval is written against it for both the classes and we will plot the above table to check how well the first assumption is met by it:
The above graph shows the normal distribution of height variable for both the classes, hence the first assumption is satisfied.
Now let’s calculate the mean and variance for the two classes.
The Variance of the variable under consideration is almost equal in both the classes and here with it is met the second assumption of LDA.
Let’s now directly jump to the linear discriminant analysis where our main focus will be to train a model from the above data, so that we can predict the gender of some other person given his/her height and whose information is not present in the above table. In other words you should be able to answer the question of what will be gender of a person whose height is say 152 cm.
LDA relies heavily on Bayes’ Theorem which, as I said is a pre requisite to understand this article. The Bayes’ Theorem states that:
I will try to explain it a bit. P(A1|B) is read as the Probability of A1 given B. It means the probability of event A1, when event B has already occurred e.g. Probability of Rainfall when humidity is above 80% can be written as P(Rainfall | Humidity > 80%). P(B|A1) will be the above situation flipped i.e. probability of high humidity when rainfall has already occurred. P(A1) is called the prior probability, in this case the probability of rainfall. An important point to note down is that if A1 represents occurrence of rainfall, A2 will be the event of no rainfall, a two-class problem and all the other terms will get the usual meaning.
With that said about Bayes’ theorem and assuming that you have a prior knowledge about the same, let’s focus again on LDA. For the data table given to us we need to check the probability of a height value falling in two classes of gender. Which means we will have to calculate:
P (gender = male | height = 152) and P (gender = female | height = 152) and then check which probability value is higher. We will first calculate the probability of male class, which as per Bayes’ theorem is equal to:
Let’s calculate the terms in the right-hand side of the equation one by one:
P(gender = male) can be easily calculated as the number of elements in the male class in the training data set divided by total number of elements i.e. 5/11 = 0.454. Also P(gender = female) will be 6/11 = 0.545.
Now we have to calculate the conditional probability terms, which will be found out through the first assumption of LDA, yes, the distribution of a variable in each class is normal.
In simple words, P(height = 152 | gender = male) will follow normal/gaussian distribution and P(height = 152 | gender = female) will follow normal/gaussian distribution too.
We know the equation of a normal curve is:
Let’s put the above values in the Gaussian equation for both the classes:
Plugging the above derived values in the (eq. 2) we have:
Now as it is evident that the P(gender = male| height = 152) is less than that of P(gender = female |height = 152), we can classify the height of 152 cm in the female class.
This is how linear discriminant analysis works. To show you a little general view, we will plug the distribution equations in the base equation (eq. 2) to see the model that is actually trained in this algorithm:
The same type of equation can be used to find P(gender = female | height = x) for any value of height.
Steps in LDA model training:
- Calculate the mean of variable for each class.
- Calculate the variance of the variable for each class.
- Calculate the probability of each class (prior probability).
- Use the values of mean, variance and prior probability to develop final model by assuming normal distribution of the variable in each class.
Linear for a reason
Let’s do a few more calculations to prove an another point. What will be the probabilities for height = 156cm. Substitute the value in above equations and you will find that the probabilities for both female and male class is almost equal (0.5). It is that value of height which acts as a threshold. All the height values above 156cm will be classified as male and those below will be classified as female. A graphical representation is shown below:
As is evident from the above graph, the linear discriminant analysis always draw a straight or linear separation boundary.
Below is the Python code for implementing whatever we have done so far:
Please post your comments/suggestions
Have a good time :)
This article is also published here