Naive Bayes Classifier

3 min readSep 29, 2018

“rocks on sea bed” by Yannis Papanastasopoulos on Unsplash

As mentioned previously, Supervised Learning is one of the types of Machine Learning algorithms. The most basic and fast Supervised Learning algorithm is called the Naive Bayes classifier.

As can be seen in the name itself, this is a classification algorithm and is based on Bayes’ probability theorem. The reason it’s called the ‘Naive’ Bayes classifier is because it assumes that all the features are independent of each other.

For those of you who aren’t too familiar with this theorem, click here to gain a better understanding.

In short, Bayes probability theorem states that if there are two events, A and B, then the probability of A occurring given that B has already occurred is determined by:

Let’s take the case of the iris data set. For the first row, the Naive Bayes classifier will calculate the probability of each of the three possible classes (Iris-virginica, Iris-setosa, Iris-versicolor) based on the data of the features present in the first row. It checks the likelihood of that flower belonging to each class. So, for the same data, it will calculate the probability of the flower being from each of the three classes. Ultimately, it assigns the class that has the highest probability. This process is repeated for each row.

The iris data set has been designed to serve as an introduction into the machine learning and data science world, and hence it doesn’t contain any anomalies or missing values. Real life data, however, isn’t this perfect.

If a data set contains no observations in certain rows, then the Naive Bayes classifiers assigns a 0 value (zero frequency) to that particular feature for that instance. Probability cannot be calculated for a 0 value, and so this is a problem of the Naive Bayes classifier which has to be solved.

Let’s take the example of classifying an email as ‘spam’ or ‘not spam’.

Email one says — ‘check out your 2018 monthly horoscope!’ and is marked as ‘spam’ by the classifier.

Email two says — ‘check out ur 2018 monthly horoscope!’

You can see that the two emails mean the same thing and that the only difference between the sentences is the spelling of ‘your’ and ‘ur’. However, if ‘ur’ is not present in the classifier’s dictionary, then it will assign it the value of 0, which will lead to an overall probability of 0 for the ‘spam’ label. Thus, email two might not be classified as ‘spam’ purely because the spelling of a word is different.

The solution to this problem is a smoothing technique called Laplace correction, which adds the value of the smoothing parameter (a parameter found when coding for the Naive Bayes classifier) to the 0 value, ensuring that the probability is calculated. For example, if the smoothing parameter is 1, then,

Now the probability will never be 0.

There are three types of Naive Bayes classifiers –

1. Gaussian

This classifier is used when dealing with real-time data because it assumes that the features follow a normal distribution, and so only the mean and standard deviation of the data needs to be estimated.

Click here to see the Gaussian Naive Bayes Classifier coded in Python using Scikit-Learn.

2. Multinomial

This classifier is commonly used in the field of Natural Language Processing (which will be discussed in future articles). For example, it is used to calculate the number of occurrences of words in a piece of text.

3. Bernoulli

This classifier is based on the binomial theorem, and hence deals with data that has binary (two) labels. For example, to classify emails as ‘spam’ or ‘not spam’.

To read more, click here.

Please let me know what you thought of this post in the comments below, thank you :) Next, I’ll be discussing Linear Regression, so stay tuned!

Naive Bayes Classifier

Written by Shubhangi Hora