Naives Bayes Classifiers for Machine Learning

Part 3 of a Series on Introductory Machine Learning Algorithms

Madison Schott

Published in

Capital One Tech

5 min readApr 24, 2019

We’ve covered k-nearest neighbor and k-means clustering, today we’ll cover naives bayes classifiers

Introduction

Naives bayes classifiers are a group of machine learning algorithms that all use the Bayes’ Theorem to classify data points. The Bayes’ Theorem is named after Reverend Thomas Bayes, a man who studied probability and binomial distributions in the 18th century. However, it cannot be said for sure who discovered the theorem; there are rumors that Nicholas Saunderson, a man who spent most of his career developing and perfecting traditional mathematical and scientific philosophy, discovered the theorem long before Bayes’ time.

You may be wondering, why these classifiers are called “naives bayes.” The reason they are called “naive” is because they each assume features of a data point as being completely independent of one another. Naives bayes classifiers use the probabilities of certain events being true — given other events are true — in order to make predictions about new data points. This is the factor that makes this formula so unique compared to other machine learning classifying algorithms.

Pros:

Simple to build and use.
Easy to train.
Ignores irrelevant features.

Cons:

Assumes data point features are independent.
Works better with large data sets.

Where to Use Naives Bayes

These classifiers are used behind the scenes of applications you access every day! Some of the most popular uses for them are weather prediction, email spam detection, and facial recognition. This algorithm is great for these uses because of the fact that it always assumes independence. Assuming independence is a key quality for an algorithm that is working on this kind of data.

http://dataespresso.com/en/2017/10/24/comparison-between-naive-bayes-and-logistic-regression/

With email spam detection, a naives bayes classifier will help determine whether an email is spam by looking at the probability of different events occurring related to that one email. It may assess the probability of an email being spam given it is from a certain email address, has a certain format, or includes grammatical errors. The simple math representation would look like this:

In large classification problems like this one and the other examples I named, there are many more possible events than just one or two. These problems look at every possible case and assess the probability between all combinations of these cases.

The Mathematics Behind Naives Bayes

The naives bayes classifiers are completely dependent on the Bayes’ Theorem (hence the name), since the classifiers simply apply the formula to sets of data. This theorem consists of a formula assessing probabilities of different events occurring. The formula below is the simplest version of it, with only two events — Event A and Event B.

Remember, this is giving you the probability of event B occurring given that A has already occurred. It uses two types of probabilities:

Probability of each event.
Probability for each event given another event value.

When there are more than just two possible events in a data set, the formula will take this form.

While this formula may initially look confusing, it is simply using a bunch of combinations of these two types of probabilities of events in order to find the likelihood of a certain event occurring.

For example, let’s say you have a friend who likes to run but is very particular about the weather she will go out in. You want to stop by her house to chat but you aren’t sure if she is home. If you are trying to determine whether your friend has gone on a run, based on her preference for warm weather, the formula would look something like this:

You are multiplying the probability of it being warm, given your friend is running, and the probability of her going for a run. You are then dividing it by the probability that it’s sunny outside.

It’s a lot simpler than it looks!

Conclusion

If you love statistics as much as I do, then you probably favor the naives bayes classifiers over most of the others. They assume independence, are easy to use, and easy to train. The classifiers are best to use if you believe there could be hidden factors affecting your data set, like extreme outliers or obscure patterns.

However, one thing to look out for is unrepresentative data or zero instances of a certain event in a data set. If this occurs, it could give you an inaccurate result of zero probability for an event. Because you are multiplying and dividing the probabilities of events, a probability of zero would jeopardize the accuracy of your results. This is why it is always important to collect data that gives an accurate representation of the population you are studying.

In conclusion, naives bayes is a group of algorithms useful in classifying large data sets using probability. It has many possible applications, lots of which you use in your daily life. I challenge you to identify applications you use everyday that potentially use this group of classifiers, it’ll help you get a better understanding of their accuracy and power.

For more resources, check out some projects using naives bayes classifiers:

DISCLOSURE STATEMENT: © 2019 Capital One. Opinions are those of the individual author. Unless noted otherwise in this post, Capital One is not affiliated with, nor endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are property of their respective owners.