Naive Bayes Classifier using Python

A beginner’s guide to master the fastest and simplest classification algorithm

Published in

Analytics Vidhya

6 min readSep 28, 2019

Should I bat first or bowl first? Umm... Pitch seems damp, Weather is sunny, Lets bowl first. You might have gone through this situation where you make decisions based on some independent conditions. Let's pick the example of cricket, suppose you have won the toss and now as a captain, you have to decide whether to bowl first or bat first. You will take this decision based on some conditions like weather, pitch conditions, humidity, breeze and assume that all these conditions independently contribute to our decision of whether to bowl first or bat first. If the pitch is fresh and has less grass, weather is clear, less humidity and breeze is slow then that definitely means you should bat first and get a higher score on the board. Otherwise, if the pitch is wet and has cracks, weather is overcast, the breeze is fast then as a captain you will go for bowling. So we make ‘Naive’ assumptions that all these conditions contribute independently to our decision as a captain. That's what a human does, now let's learn the algorithm which helps the machine to approach similarly.

Let's break the term ‘Naive Bayes classifier’ and then understand it.

Naive is for the ignorant assumptions we make during the calculation of probability in the algorithm. Naive Bayes is also called idiot Bayes because the calculation of the probabilities for each hypothesis is simplified to make the calculation easy which is most unlikely in real data. Now let's move further and get to the mathematical term.

Let's understand the ‘Bayes’ term, the most important probabilistic approach to the problem. Bayes theorem provides a way of calculating the posterior probability based on likelihood, Marginal, and Prior probability. We will understand further what all these terms mean. One of the easiest ways of selecting the most probable hypothesis given the data that we have that we can use as our prior knowledge about the problem. Bayes’ Theorem provides a way that we can calculate the probability of a hypothesis given our prior knowledge. For example in the cricket example, we had to calculate P(Bat or Bowl | Weather, pitch, humidity…..)

Now for the formula, we assume ‘H’ stand for Hypothesis and ‘E’ for Evidence or simply predicted vs given data.

Bayes Theorem, source:swarm-help.zendesk.com

P(H|E) is the probability of hypothesis H given the data E. This is called the posterior probability.
P(E|H) is the probability of data E given that the hypothesis H was true.
P(H) is the probability of hypothesis H being true (regardless of the data). This is called the prior probability of H.
P(E) is the probability of the data (regardless of the hypothesis).

And the final term Classifier is simply a discrete response by the model. Compared to regressor in which we have a continuous response. The best example is sentiment analysis, in which we classify the text in discrete responses like positive, negative or neutral.

Altogether we make naive assumptions that all the predictors can be calculated independently to give the final result, hence we calculate the posterior probability of hypothesis with each predictor independently by Bayes theorem. This procedure further helps in reaching the final task i.e classification. It is a supervised learning algorithm. Further this states and explains the term ‘Naive Bayes Classifier’.

Maths Behind The Algorithm:

Let's get back to the initial example to understand the calculation behind naive Bayes.

Say, If cricket pitch is wet what is the probability of Batting first?

P(Batting first|wet)=P(wet|Batting first)*P(Batting first)/P(wet)

Above I have a training data set of weather and corresponding target variable ‘Batting first?’ . Now, we need to classify whether players will choose bat or bowl based on weather condition. Let’s follow the below steps to perform it.

Convert the data set into a frequency table.

2. Create a Likelihood table by finding the probabilities.

Similarly, calculate the likelihood table for every other predictor.

3. Now, use the Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of the prediction.

So simply if we derive it for weather then,

P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)
P (Sunny |Yes) = 3/10 = 0.3, P(Sunny) = 5/14 = 0.36, P( Yes)= 10/14 = 0.64
Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.

Hence the captain is probable to choose to bat first if it's sunny.

Hence if we want to predict based on weather, humidity and various attributes more, then

P(Yes|weather, pitch, humidity… )=P(Yes|weather)*P(Yes|Pitch)*P(humidity)……

So simply multiplying posterior probabilities of each attribute to get the final required result.

Similarly, we can use Naive Bayes classifier for text classification too.

NLP and word tokenization is prerequisite for that. Then further Naive Bayes Classifier will help us in classifying text in groups.

Ex: P(Written by Shakespeare | “ABHOR” ,”ABSOLUTE”, “COIL”)

That's what we will calculate to get the possibility of given text being written by Shakespeare because abhor, absolute and coil are some of the most used words by Shakespeare.

Types Of Naive Bayes Model:

Gaussian model: It is used in simple classification with normal distributed data.
Multinomial model: It is mostly used in text classification problems. With the help of Bernoulli trials, we can find “how often a word occurs in the given text”.
Bernoulli model: The binomial model is useful if your feature vectors are binary. Used for simply two discrete variables.

Why use Naive Bayes Classifier?

Easy and fast to predict the class of test data set. It also performs well in multi-class prediction
Sometimes performs better than logistic regression. Use both the algorithms and pick up the one with the highest accuracy.
Works more efficiently with problems where we have hundreds of data points but few variables.
It is known to handle both continuous and discrete data very well.
If the independence assumption holds, it results in the most efficient classification algorithm.

Where to use Naive Bayes Classifier?

Text Classification
Sentiment Analysis
Spam Filtering
Categorizing news into political, science, technology, crime, etc.
Recommendation systems

And further also used in Real-time predictions and multi-class predictions too. Naive Bayes classifier has a large number of practical applications.

Here is a simple Gaussian Naive Bayes implementation in Python with the help of Scikit-learn. We have used the example of the decision of batting or bowling with features of weather and humidity. We will label those features initially, train them and use the trained model to get the predicted results.

Naive Bayes using sklearn and python

This is just an example of how to implement. Use a huge dataset when you want much more accuracy for your model.

Important References:

Naive Bayes with Scikit-Learn: Implementation with inbuilt function Gaussian Naive Bayes of Scikit-Learn.
Naive Bayes Documentation

Now you can understand the back scene of news categorization and design it too. Load your favorite huge dataset with few variables and give it a try! How accurate is Naive Bayes? Check it. From here on, all you need is practice.