Naive Bayes Classifier

Published in

Analytics Vidhya

5 min readMay 3, 2020

Introduction

Naïve Bayes algorithm is a machine learning supervised classification technique based on Bayes theorem with strong independence assumptions between the features. It is mainly used for binary or multi class classification and still remains one of the best method for Text categorization and document categorization.

For example, a vegetable may be considered to be tomato if it is red, round and 2 inches in diameter. A naive Bayes classifier considers each of these features to contribute independently to the probability that this Vegetable is a tomato, regardless of any possible correlations between the color, roundness, and diameter features.

Bayes Theorem

Bayes theorem provides a way of calculating posterior probability P (A|B) from P (A), P (B) and P (A|B). Look at the equation below:

P (A|B) = posterior probability, How probable is our hypothesis given the observed evidence [This value we are going to compute, which is not directly computable)

P (B | A) = Likelihood. How probable is the evidence given that our hypothesis is true?

P (A) = Prior Probability. How probable was our hypothesis before observing the evidence?

P (B) = Marginal Probability. How probable is the new evidence under all the possible hypothesis?

Now we will see how we can implement above formula into real life classification problem.

Problem statement: Classify person whether he/she will play golf on the given Outlook like sunny, rainy or overcast.

Dataset:

Frequency table for above dataset is as below,

For the above frequency table Likelihood table is as below,

Now when we put all the calculated Values to Bayes theorem to calculate the Posterior probability.

Likelihood of Yes given Outlook Sunny is

Likelihood of No given Outlook Sunny is

With the above two computations we can observe that probability of playing golf when outlook sunny is more compared to not sunny.

Types of Naive Bayes Classifier:

Multinomial Naive Bayes: Feature vectors represent the frequencies with which certain events have been generated by a multinomial distribution. This event is mainly used for document classification ((such as spam or legitimate, sports or politics, etc.)

Bernoulli Naive Bayes: In the multivariate Bernoulli event model, features are independent Boolean (binary variables) describing inputs. Like the multinomial model, this model is popular for document classification tasks, where binary term occurrence (i.e. a word occurs in a document or not) features are used rather than term frequencies (i.e. frequency of a word in the document).

Gaussian Naive Bayes: In Gaussian Naive Bayes, continuous values associated with each feature are assumed to be distributed according to a Gaussian distribution. A Gaussian distribution is also called Normal distribution. When plotted, it gives a bell shaped curve which is symmetric about the mean of the feature values as shown below:

Since the way the values are present in the dataset changes, the formula for conditional probability changes to,

Machine Learning Example

Load the Data : Load the Iris dataset from sklearn library.

2) The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.

3) Split Data into Training and Testing set: Store the feature matrix (input) and response vector (output). Split the dataset into training and testing set.

4) Train the model and find the accuracy of testing set :Training the model on training set. For this Import the GaussianNB from sklearn library. Using the fit method train the model. And predict the testing set with the testing input features (X_test). Compare the output predicted and actual output to get the accuracy.

Important Notes to use Naïve Bayes classifier

If continuous features do not have normal distribution, we should use transformation or different methods to convert it in normal distribution.
Remove correlated features, as the highly correlated features are voted twice in the model and it can lead to over inflating importance.
You might think to apply some classifier combination technique like ensembling, bagging and boosting but these methods would not help. Actually, “ensembling, boosting, bagging” won’t help since their purpose is to reduce variance. Naive Bayes has no variance to minimize.

Challenges of Naïve Bayes classifier

· The assumption of independent features. In practice, it is almost impossible that model will get a set of predictors which are entirely independent.

· If there is no training tuple of a particular class, this causes zero posterior probability. In this case, the model is unable to make predictions. This problem is known as Zero Probability/Frequency Problem.

Summary

Naive Bayes is the most straightforward algorithm. In spite of the significant advances of Machine Learning in the last couple of years, it has proved its worth. Many models have been deployed using this algorithm ranging from sentiment analysis, Text classification to recommendation engines etc.

Reference : https://en.wikipedia.org/wiki/Naive_Bayes_classifier

Github :https://github.com/DivakarPM/DataScience/tree/master/Naive%20Bayes%20classifier

Naive Bayes Classifier

Introduction

Machine Learning Example

Written by Divakar P M