Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

4 min readSep 29, 2017

--

Gaussian Discriminant Analysis
an example of Generative Learning Algorithms

Generative Learning Algorithms:
In Linear Regression and Logistic Regression both we modelled conditional distribution of y given x, as follow.

Press enter or click to view image in full size

Algorithms that model p(y|x) directly from the training set are called discriminative algorithms.
There can be a different approach to the same problem, consider the same binary classification problem where we want learn to distinguish between two classes, class A (y=1) and class B (y=0) based on some features. Now we take all the examples of label A and try to learn the features and build a model for class A. Then we take all the examples labeled B and try to learn it’s features and build a separate model for class B. Finally to classify a new element, we match it against each model and see which one fits better (generate high value for probability). In this approach we try to model p(x|y) and p(y) as oppose to p(y|x) we did earlier, it’s called Generative Learning Algorithms.
Once we learn the model p(y) and p(x|y) using training set, we use Bayes Rule to derive the p(y|x) as

Press enter or click to view image in full size

Gaussian discriminant analysis model
When we have a classification problem in which the input features are continuous random variable, we can use GDA, it’s a generative learning algorithm in which we assume p(x|y) is distributed according to a multivariate normal distribution and p(y) is distributed according to Bernoulli. So the model is

Press enter or click to view image in full size

Now as we did in Linear Regression and Logistic Regression, we need to define the log likelihood function L and then by maximising L with respect to model parameters, find the maximum likelihood parameters.

Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size

So Eq(2) , Eq(4) and Eq(5) defines all the maximum likelihood parameters of GDA as below

Press enter or click to view image in full size
Model parameters for GDA

Example: I used a custom iris dataset (reduced features to fit it in to two dimensions and removed the 3rd class from the data set to make it for binary classification)for this example.The data plot looks as

Press enter or click to view image in full size

For the dataset computed model parameters are as below

Press enter or click to view image in full size

Based on the algorithm the probability density plots for each models look as

Press enter or click to view image in full size
3D Surface Plot for p(x|y=1)
Press enter or click to view image in full size
3D Surface Plot for p(x|y=0)

And from their contour plots you can see the models explains the test data. The surface of the models are not smooth (As the assumption p(x|y) is guassian is not perfectly true for this particular test data. And will discuss how it affects the accuracy of the model) and so the contours will also not be smooth but noisy.

Press enter or click to view image in full size
Contour Plot for p(x|y=0) with data points for class y=0
Press enter or click to view image in full size
Contour Plot for p(x|y=1) with data points for class y=1

Lets take two test points to test our algorithm as
x0 = [4, 4]
x1 = [6.5, 2.25]

You can see the algorithm favours the class 0 for x0 and class 1 for x1 as expected. Both Logistic Regression and Gaussian Discriminant Analysis used for classification and both will give a slight different Decision Boundaries so which one to use and when.
GDA makes an assumption about the probability distribution of the p(x|y=k) where k is one of the classes. And it can be easily proved that for GDA, if the initial assumptions about distribution of p(x|y=k) (Gaussian)and p(y)(Bernoulli) are true, then p(y|x) can be expressed as Sigmoid. But vice-versa is not true. Which means GDA makes more specific assumptions about the data set then Logistic Regression and if those assumptions are true then it works better then LR. This fact can be useful when you know better about the nature of the dataset you’re dealing with and which is Gaussian, and you don’t have a large training set, in situations like this GDA will perform better then LR. But on the other hand LR makes more generic assumptions and can be more useful in lot of other places where the probability distribution of the feature set is not Gaussian.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Responses (10)