Naïve Bayes Algorithm With Python

Abhijeet Pujara

Published in

Analytics Vidhya

4 min readMay 28, 2020

This article covers five parts:

What is a Naïve Bayes algorithm?
Naïve Bayes algorithm application
Pros of Naïve Bayes
Cons of Naïve Bayes
Naïve Bayes with python(With code)

What is the NaïveBayes Algorithm?

Naïve Bayes Algorithm is one of the popular classification machine learning algorithms and is included in supervised learning. that helps to classify the data based upon the conditional probability values computation. This algorithm is quite popular to be used in Natural Language Processing or NLP also real-time prediction, multi-class prediction, recommendation system, text classification, and sentiment analysis use cases. the algorithm is scalable and easy to implement for the large data set.

The algorithm based on Bayes theorem. Bayes Theorem helps us to find the probability of a hypothesis given our prior knowledge.

Let’s look at the equation for Bayes Theorem,

Naïve Bayes is a simple but surprisingly powerful predictive modeling algorithm. Naïve Bayes classifier calculates the probabilities for every factor. Then it selects the outcome with the highest probability.

Applications of Naïve Bayes Algorithm

1. Real-time prediction: Naïve Bayes Algorithm is fast and always ready to learn hence best suited for real-time predictions.

2. Multi-class prediction: The probability of multi-classes of any target variable can be predicted using a Naïve Bayes algorithm.

3. Text Classification where Naïve Bayes is mostly used is Spam Filtering in Emails (Naïve Bayes is widely used for text classification)

4. Text classification/ Sentiment Analysis/ Spam Filtering: Due to its better performance with multi-class problems and its independence rule, Naïve Bayes algorithm perform better or have a higher success rate in text classification, Therefore, it is used in Sentiment Analysis and Spam filtering.

5.Recommendation System: Naïve Bayes Classifier and Collaborative Filtering together build a Recommendation System that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a given resource or not.

Pros of Naïve Bayes

The assumption that all features are independent makes a Naïve Bayes algorithm very fast compared to complicated algorithms. In some cases, speed is preferred over higher accuracy.
For problems with a small amount of training data, it can achieve better results than other classifiers because it has a low propensity to overfit.
It works well with high-dimensional data such as text classification, email spam detection.
Less sensitive to missing data, the algorithm is also relatively simple, often used for text classification;
Naïve Bayes explains the results easily.

Cons of Naïve Bayes

The strong assumption about the features to be independent which is hardly true in real-life applications.
Naïve Bayes is the assumption of independent predictors. In real life, it is almost impossible that we get a set of predictors that are completely independent.
Chances of loss of accuracy.
If the categorical variable has a category in the test data set, which was not observed in the training data set, then the model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as Zero Frequency. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.