Introduction to Naive Bayes

Jitss
Jitss
May 14, 2018 · 3 min read

This blog is for those who want to know more about one of machine learning easiest and probabilistic approach i.e Naive Bayes.

Naive Bayes comes under supervise machine learning which used to make classifications of data sets and predict things based on its prior knowledge and independence assumptions.

They call it naive because it’s assumptions (it assumes that all of the features in the data set are equally important and independent) are really optimistic and rarely true in most real-world applications.

Naive Bayes Algorithm: It is classification algorithm which makes the decision for the unknown data set. It is based on Bayes Theorem which describes the probability of an event based on its prior knowledge.

It is widely used for text classification which used in various applications like google search, email sorting, language detection etc. Below diagram shows how N.B achieve predictions.

It is used to check prior probabilities of the data set and along with that, it will provide new probabilities.
Prior probabilities always divided in 50%–50% for all data sets and job of the algorithm is to decide which class label it belongs, based on currently existing objects.

For eg: Weight and Height are the parameters which decide whether players are Sumo wrestlers or Basketball players. Initially, it divided into two parts which are equally distributed.

The formula to predict NB:

Above,

  • P(H|E) is the posterior probability of class (H, target) given predictor (E, attributes).
  • P(H) is the prior probability of class.
  • P(E|H) is the likelihood which is the probability of predictor given class.
  • P(E) is the prior probability of predictor.

How to use Naive Bayes Algorithm?

Use Case: With below weather conditions and corresponding target variable ‘Play’ (suggesting possibilities of playing). Now, we need to classify whether players will play or not based on weather condition. Let’s follow the below steps to perform it.
Step 1: First we find out Likelihood of table which shows the probability of yes or no in below diagram.
Step 2: Find the posterior probability of each class.

Problem: Find out the possibility of whether the player plays in Rainy condition?

P(Yes|Rainy) = P(Rainy|Yes) * P(Yes) / P(Rainy)

P(Rainy|Yes) = 2/9 = 0.222
P(Yes) = 9/14 = 0.64
P(Rainy) = 5/14 = 0.36

Now, P(Yes|Rainy) = 0.222*0.64/0.36 = 0.39 which is lower probability which means chances of the match played is low.

Usage naive bayes algorithm:

  1. If you have a moderate or large training data set.
  2. If the instances have several attributes.
  3. Given the classification parameter, attributes which describe the instances should be conditionally independent.
  4. When there are limited resources in terms of memory and CPU.
  5. It is less computationally intensive.

Pros and cons of Naive Bayes Algorithm?
Pros :

  1. Easy to implement
  2. Less model complexity
  3. Less CPU computation.
  4. Performing good when the data set is small.

Cons :

  1. Very strong assumption due to these result is bad, it called naively.
  2. Data scarcity
  3. Real life problem such as no variable dependency.
  4. Not performing good when data is too large and it’s assumption that all features are independent.

Refer GitHub repository: https://github.com/jiteshmohite/Naive-Bayes-Examples

We have received some great answer inside stackoverflow: https://stackoverflow.com/questions/10059594/a-simple-explanation-of-naive-bayes-classification

Jitss

Written by

Jitss

I am technology enthusiastic, want to learn things quickly and dive deep inside it. I always believe in developing logical things which makes impact on end user