Naive Bayesian Classification

Saijal Shakya
Incwell Bootcamp
Published in
4 min readJul 13, 2019

The Naive Bayesian classifier is based on Bayes theorem with the independence assumptions between predictors. It is a probabilistic classifier that makes classifications using the Maximum Posterior Probability.

Forbes

In Bayesian statistics, the posterior probability is the statistical probability that a hypothesis is true calculated in the light of relevant observations.

Bayes Theorem

Bayes Theorem
H = HypothesisP = ProbabilityP(H\X) = Conditional ProbabilityP(X\H) = Posterior ProbabilityP(H) = Prior Probability

We can find the probability of H when X has occurred. Here, X is evidence and H is a hypothesis. This assumption states that features are independent. It is called naive because the presence of one feature does not affect others.

Why to use Bayes theorem?

  1. It is a multi-layer feed forward neural network or simply neural network which is easy to apply and predicts the class of test data set fast.
  2. It requires less training data compared to other models.
  3. It calculates prior and posterior probability which means it implements “think harder” and “calculate harder” where other models lack to guarantee it.
  4. It has the ability to incorporate prior information.
  5. It provides a convenient setting for a wide range of models, such as hierarchical models and missing data problems.

Is Naive Bayes that good to use?

Obviously, YES. Naive Bayes model size is low and quite constant with respect to the data. The Naive Bayes models cannot represent complex behavior so it won’t get into over-fitting.

Some of the real world examples where Naive Bayes can be used:

  1. To identify spam email
  2. Check a piece of text identifying positive and negative emotions.
  3. As Naive Bayes is super fast, it can be used for making predictions in real time. For example: identifying water level in the river
  4. Multi-class predictions
  5. Sentiment Analysis

Naive Bayesian Classification

Naive Bayesian Classification
Ci = Class Label
X = Tuples
P(X\Ci) is defined by product

How does naive Bayesian Classification is done mathematically?

Go through the table below before starting Bayesian Classification

Now we will start Bayesian Classification

Parameters:X = (
age = Youth,
income = Medium,
student = Yes,
credit_rating = Fair
)

We will predict using the above parameter that if people buy a bike or not.

P( buys_bike = Yes) = 9/14
= 0.643

Above 9 is the total number of people who buys a bike. And 14 is the total number of rows (data).

P( buys_bike = No) = 5/14
= 0.35

Above 5 is total number of people who does not buy bike. And 14 is the total number of rows (data).

P( age = Youth \ buys_bike = Yes ) = 2/9

An age group of Youth who buys bike is 2/9

P( age = Youth \ buys_bike = No) = 3/5

An age group of Youth who does not buy a bike is 3/5

P( income = medium \ buys_bike = Yes) = 4/9

A group of people having medium income and buys bike is 4/9

We have to calculate all the probabilities we defined in the parameters above.

I will write all the remaining probabilities below

P (income = medium \ buys_bike = No) = 2/5P( student = Yes \ buys_bike = Yes ) = 6/9P( student = Yes \ buys_bike = No) = 1/5P( credit_rating = Fair \ buys_bike = Yes) = 6/9P( credit_rating = Fair \ buys_bike = No) = 2/5

Conditional Probability

The conditional probability of an event H is the probability that the event will occur given the knowledge that an event X has already occurred. This probability is written P(H \ X).

The conditional probability to buy a bike is

P( X \ buys_bike = Yes) = 2/9 * 4/9 * 6/9 * 6/9
= 0.44

Conditional probability not to buy a bike is

P( X \ buys_bike = No ) = 3/5 * 2/5 * 1/5 * 2/5
= 0.019

Therefore we found 0.44 probability that people with above parameters will buy a bike and 0.019 probability that people with above parameters will not buy bike.

Hence, this algorithm states people with above parameter buys bike.

Implementing Naive Bayesian Classification in Python

--

--