Machine learning — Naive Bayes

What’s the next big thing in IT industry if we ask, the obvious answer would be Artificial Intelligence. All the big companies from Google to Facebook are betting high on this. According to industry experts, in next few years, the software industry will be overtaken by Artificial Intelligence.

With the growing importance of AI in the industry, more and more developers are clasping on to AI knowledge. As much as it looks interesting, it is tough to perceive. For a beginner, the terms, Artificial Intelligence, Machine Learning, Data Science, Deep learning all may be similar, but they have their importance and value to add.

Artificial Intelligence is powering computers with intelligence capabilities to carry out tasks without human intervention. Machines acquire the knowledge by continuously practising a variety of programs on existing data. Data science offers a set of scientific methods to learn and extract meaningful information from the data. Machine learning being one part of the AI uses the algorithms built on scientific methods to derive intelligence from data to make computers smart decision makers. Today we will give a quick intro to one such Machine Learning algorithm — Naive Bayes / Independence Bayes.

Before we learn what it is, some of its uses are

  1. Sentiment Analysis
  2. Spam Filtering
  3. To classify Articles into different Categories
  4. Also used in Face Recognition Software

Naive Bayes

It’s a classification algorithm which is identifying the category of a new observation by training a set of observation data whose category membership is known. As part of the name (naive) suggests it makes a simple assumption that features of a category are independent of each other which makes it a simple algorithm to use for classification problems. Despite their simple design and assumptions, naive Bayes classifiers have worked quite well in many complex real-world situations.

For example — To identify the gender of a person, the features height, weight, foot size contribute equally in probability calculation regardless of any possible correlations between height, weight and foot size.


  • P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
  • P(c) is the prior probability of class.
  • P(x|c) is the likelihood which is the probability of predictor given class.
  • P(x) is the prior probability of predictor.

There are three types of event models by which P(x|c) can be calculated based on the problem we are trying to solve.

1. Gaussian Naive Bayes

When dealing with continuous data, a typical assumption is that the values associated with each class are distributed according to a Gaussian distribution. For example, suppose the training data contains a continuous attribute, x. We first segment the data by the class and then compute the mean and variance of x in each class. The mean and variance is used in probability calculation using,

Example Scenario — Identify if a person diabetic given some of his medical conditions and daily routine

2. Multinomial Naive Bayes

This implements the naive Bayes algorithm for multinomially distributed data and is one of the two standard naive Bayes variants used in text classification (where the data are typically represented as word vector counts, although tf-idf vectors are also known to work well in practice).

Example Scenario — Spam filtering or classifying articles into different categories

3. Bernoulli Naive Bayes

In the Bernoulli event model, features are independent booleans (binary variables) describing inputs. Like the multinomial model, this model is popular for document classification tasks, where binary term occurrence features are used rather than term frequencies.

Example Scenario — Sentiment analysis for short text

In the above three event models category of a sample data is chosen based on the highest probability density score amongst all category probabilities. No need to work on above algorithms from scratch if you are familiar with Python. Follow scikit-learn library for a quick start.

In the next article, we will dive into using the scikit-learn library to solve some of the problems using Naive Bayes.