Naïve Bayes Classifier

DEVJYOTI KARAN
ADGVIT
Published in
5 min readAug 24, 2020

So here’s the condition. You are working as a data scientist for your company and your boss asked you to work on a classification problem with hundreds of thousands of data points and quite a few variables in your data set. Now there are quite a few classification algorithms but if I were in your situation I would go with Naïve Bayes as in most of the cases it is faster than other classification algorithms.

So what is Naïve Bayes algorithm?

As the name suggests it is a classification technique based on Bayes theorem which you might have learned in class 12 if you have already given your senior secondary. It works with an assumption that the presence of a particular feature in a class is unrelated to the presence of any other feature.

For example, a person’s salary depends on his/her learning, knowledge, and experience. Now even though all these features are dependent on each other or maybe they got some co-variance among them but these properties will independently contribute to the probability of his expected salary.

Following is the theorem:-

Here:

P(c/x) is the posterior probability

P(x/c) is the likelihood i.e the probability of the predictor given class

P(x) is the prior probability of the class

P(x) is the prior probability of the predictor

Working of the algorithm

So to understand this theorem properly, let’s take an example. Suppose you are working as a data scientist of a car company and you have to classify whether a person will buy a car or not based on his salary and age(taken as the significant factors). So you plot a graph between salary and age salary and then plot the points whether a person will drive or not.

This is the data set we are already provided. Now we are given a new point and we have to decide whether that person will walk or drive.

So what we do is we make a hypothetical circular region of a certain radius around that point and consider all the points lying In that region to be similar to the given new point.

Then we use the formula to compute the probability of a person driving or walking.

Also as we have to classify only between driving and walking so if we find out the P(walking) then P(driving) =1-P(walking)

Let’s find out the probability of a person walking then

1)P(walks)

P(walks) = (total number of people who walk)/(total observation)

i.e 10/30

2)P(X)

3)P(x/walks)

Therefore the probability the person walks is:

i.e 0.75

hence as the probability of the person walking is 0.75 therefore the probability of the person driving will be 0.25. As a result, it is less likely for that person to buy a car.

Following is the code in python:-

Github code link:-

https://github.com/devjyotik200/Naive-Bayes/blob/master/NAIVE_BAYES.py

Application

Real-time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could be used for making predictions in real-time.

Multi-class Prediction: This algorithm is also well known for multi-class prediction feature. Here we can predict the probability of multiple classes of the target variable.

Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used in text classification (due to better results in multi-class problems and independence rule) have a higher success rate as compared to other algorithms. As a result, it is widely used in Spam filtering (identify spam e-mail) and Sentiment Analysis (in social media analysis, to identify positive and negative customer sentiments)

Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a Recommendation System that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a given resource or not

Advantages

1. When the assumption of independent predictors holds, a Naïve Bayes classifier performs better as compared to other models

2. Naïve Bayes requires a small amount of training data to estimate the test data. So, the training period is less.

3. Naïve Bayes is also easy to implement

Disadvantages

1. The main limitation of naïve Bayes is the assumption of independent predictors. Naïve Bayes implicitly assumes that all the attributes are mutually independent. In real life, it is almost impossible that we get a set of completely independent predictors.

2. If the categorical variable has a category in test data, which was not observed in the training data set, then the model will assign a zero probability and will be unable to make the prediction. This is often known as zero frequency. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.

Picture credits:- Udemy — Machine Learning A-Z™ Hands-On Python & R In Data Science

It is a great course if you want to start with the basics of data science.

--

--