Naive Bayes or K-NN for classification?

Phani Srikanth
Data Science | Analytics
2 min readAug 4, 2014

The basic difference between K-NN classifier and Naive Bayes classifier is that, the former is a discriminative classifier but the latter is a generative classifier.

Going into specifics, K-NN classifier is a supervised lazy classifier which has local heuristics. Being a lazy classifier, it is difficult to use this for prediction in real time. The decision boundaries you achieve with K-NN are much more complex than any decision trees, thus obtaining a nice classification. When you are solving a problem which directly focusses on finding similarity between observations, K-NN does better because of its inherent nature to optimize locally. This is also a flipside because, outliers can significantly kill the performance. Additionally, K-NN is most likely to overfit, and hence adjusting ‘k’ to maximise test set performance is the way to go. As the complexity of the space grows, the accuracy of K-NN comes down and you would need more data, but the order of this classifier is n^2 and it becomes too slow. So, a dimensionality reduction technique like PCA, SVD etc are typically applied and subsequently this classifier is used.

Naive Bayes is an eager learning classifier and it is much faster than K-NN. Thus, it could be used for prediction in real time. Typically, email spam filtering uses Naive Bayes classifier. It takes a probabilistic estimation route and generates probabilities for each class. It assumes conditional independence between the features and uses a maximum likelihood hypothesis. The best part with this classifier is that, it learns over time. In a spam filtering task, the type of spam words in email evolves over time. In the same way, the classifier also calculates probability estimates for the newly occurring spam words in a “bag of words” model and makes sure it performs well. This feature of the classifier is due to the inherent nature of the algorithm being generative and not discriminative.

--

--

Phani Srikanth
Data Science | Analytics

www.phani.io | I enjoy working on Machine Learning, attending live concerts and following Test Cricket.