Generative VS Discriminative Models
I would like to start this write up with a story.
A father has two kids, Kid A and Kid B. Kid A has a special character whereas he can learn everything in depth. Kid B have a special character whereas he can only learn the differences between what he saw.
One fine day, The father takes two of his kids (Kid A and Kid B) to a zoo. This zoo is a very small one and has only two kinds of animals say a lion and an elephant. After they came out of the zoo, the father showed them an animal and asked both of them “is this animal a lion or an elephant?”
The Kid A, the kid suddenly draw the image of lion and elephant in a piece of paper based on what he saw inside the zoo. He compared both the images with the animal standing before and answered based on the closest match of image & animal, he answered: “The animal is Lion”.
The Kid B knows only the differences, based on different properties learned, he answered: “The animal is a Lion”.
Here, we can see both of them is finding the kind of animal, but the way of learning and the way of finding answer is entirely different. In Machine Learning, We generally call Kid A as a Generative Model & Kid B as a Discriminative Model.
In General, A Discriminative model models the decision boundary between the classes. A Generative Model explicitly models the actual distribution of each class. In final both of them is predicting the conditional probability P(Animal | Features). But Both models learn different probabilities.
A Generative Model learns the joint probability distribution p(x,y). It predicts the conditional probability with the help of Bayes Theorem. A Discriminative model learns the conditional probability distribution p(y|x). Both of these models were generally used in supervised learning problems.
If you are unaware of joint probability & conditional probability, Please read it here.
Training classifiers involve estimating f: X -> Y, or P(Y|X)
- Assume some functional form for P(Y), P(X|Y)
- Estimate parameters of P(X|Y), P(Y) directly from training data
- Use Bayes rule to calculate P(Y |X)
- Assume some functional form for P(Y|X)
- Estimate parameters of P(Y|X) directly from training data
- Naïve Bayes
- Bayesian networks
- Markov random fields
- Hidden Markov Models (HMM)
- Logistic regression
- Scalar Vector Machine
- Traditional neural networks
- Nearest neighbour
- Conditional Random Fields (CRF)s
Ask yourself the following questions to get clear understanding about both these models.
- What are the problems these models can solve?
- Which model learns joint probability?
- Which model learns conditional probability?
- What happens when we give correlated features in discriminative models?
- What happens when we give correlated features in generative models?
- Which models works very well even on less training data?
- Is it possible to generate data from with the help of these models?
- Which model will take less time to get trained?
- Which model will take less time to predict output?
- Which model fails to work well if we give a lot of features?
- Which model prone to overfitting very easily?
- Which model prone to underfitting easily?
- What happens when training data is biased over one class in Generative Model?
- What happens when training data is biased over one class in Discriminative Models?
- Which model is more sensitive to outliers?
- Can you able to fill out the missing values in a dataset with the help of these models?
Here is a nice paper by Professor Andrew NG on Generative and Discriminative Models, Read it if you want to go deep.