Let’s build your first Naive Bayes Classifier with Python

Valentina Alto
DataSeries
Published in
5 min readAug 18, 2019

--

Naive Bayes Classifier is one of the most intuitive yet popular algorithms employed in supervised learning, whenever the task is a classification problem. I’ve been talking about the difference between supervised and unsupervised learning, as well as between classification and regression, in my previous article and, if you are not familiar with this terminology, I suggest you to have a look at it.

Here, I’m going to dwell on the (surprisingly easy) math behind the Naive Bayes Classifier and then I will implement it from scratch with Python, using the well-known Iris Dataset.

To understand this algorithm, we first need to refresh some concepts of probability theory. Indeed, Naive Bayes Classifier is based on the Bayes’ Theorem of conditional probability:

Where A and B are events and P(B)≠0. We talk about conditional probability of A with respect to B when we want to know the likelihood of A event, given that B event has occurred.

Let’s visualize it with a Venn Diagram:

Basically, once the event B has occurred, the probability space of event A is reduced to the intersection between A and B, since everything that is not B cannot occur (indeed, we already know that B has already occurred!). The situation is the following:

--

--

Valentina Alto
DataSeries

Data&AI Specialist at @Microsoft | MSc in Data Science | AI, Machine Learning and Running enthusiast