Gaussian Discriminant Analysis
an example of Generative Learning Algorithms
Generative Learning Algorithms:
In Linear Regression and Logistic Regression both we modelled conditional distribution of y given x, as follow.
Algorithms that model p(y|x) directly from the training set are called discriminative algorithms.
There can be a different approach to the same problem, consider the same binary classification problem where we want learn to distinguish between two classes, class A (y=1) and class B (y=0) based on some features. Now we take all the examples of label A and try to learn the features and build a model for class A. Then we take all the examples labeled B and try to learn it’s features and build a separate model for class B. Finally to classify a new element, we match it against each model and see which one fits better (generate high value for probability). In this approach we try to model p(x|y) and p(y) as oppose to p(y|x) we did earlier, it’s called Generative Learning Algorithms.
Once we learn the model p(y) and p(x|y) using training set, we use Bayes Rule to derive the p(y|x) as
Gaussian discriminant analysis model
When we have a classification problem in which the input features are continuous random variable, we can use GDA, it’s a generative learning algorithm in which we assume p(x|y) is distributed according to a multivariate normal distribution and p(y) is distributed according to Bernoulli. So the model is
Now as we did in Linear Regression and Logistic Regression, we need to define the log likelihood function L and then by maximising L with respect to model parameters, find the maximum likelihood parameters.
So Eq(2) , Eq(4) and Eq(5) defines all the maximum likelihood parameters of GDA as below
Example: I used a custom iris dataset (reduced features to fit it in to two dimensions and removed the 3rd class from the data set to make it for binary classification)for this example.The data plot looks as
For the dataset computed model parameters are as below
Based on the algorithm the probability density plots for each models look as
And from their contour plots you can see the models explains the test data. The surface of the models are not smooth (As the assumption p(x|y) is guassian is not perfectly true for this particular test data. And will discuss how it affects the accuracy of the model) and so the contours will also not be smooth but noisy.
Lets take two test points to test our algorithm as
x0 = [4, 4]
x1 = [6.5, 2.25]
You can see the algorithm favours the class 0 for x0 and class 1 for x1 as expected. Both Logistic Regression and Gaussian Discriminant Analysis used for classification and both will give a slight different Decision Boundaries so which one to use and when.
GDA makes an assumption about the probability distribution of the p(x|y=k) where k is one of the classes. And it can be easily proved that for GDA, if the initial assumptions about distribution of p(x|y=k) (Gaussian)and p(y)(Bernoulli) are true, then p(y|x) can be expressed as Sigmoid. But vice-versa is not true. Which means GDA makes more specific assumptions about the data set then Logistic Regression and if those assumptions are true then it works better then LR. This fact can be useful when you know better about the nature of the dataset you’re dealing with and which is Gaussian, and you don’t have a large training set, in situations like this GDA will perform better then LR. But on the other hand LR makes more generic assumptions and can be more useful in lot of other places where the probability distribution of the feature set is not Gaussian.

