Do You Really Know Naive Bayes?

Güldeniz Bektaş
Analytics Vidhya
Published in
6 min readDec 22, 2020
Source: KDnuggets

Our next machine learning algorithm is Naive Bayes Classifier. Like my other articles about machine learning algorithms, I will learn with you.

What is this Bayes Theorem?

In probability theory and statistics, Bayes’ theorem (alternatively Bayes’ law or Bayes’ rule), named after Reverend Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

Might look complicated but in fact, it is easy to understand. I know, sounds rubbish, just wait for me to explain. Sometimes, definitions from other sites, like this one above from Wikipedia, can be hard to digest. When you have an example to reference to the topic, it is happening to be as easy as pie. That’s my goal for this article. Let’s dive in!

Imagine, you have some probabilities, and you want to find an another probability with these probabilities existence. That’s what Bayes Theorem is.

Let’s keep going through the formula:

Source: mathisfun

P(A|B) → How often A happens with B’s existence?

P(B|A) → How often B happens with A’s existence?

P(A) → How often A happens?

P(B) → How often B happens?

Continue with a great example:

P(Fire|Smoke) → How often there is Fire with Smoke’s existence?

P(Smoke|Fire) → How often there is Smoke with Fire’s existence?

P(Fire) → How often Fire happens?

P(Smoke) → How often Smoke happens?

I think we kinda cover this section. Like I said before, it really is easy. Now it’s time for Naive Bayes Algorithm!

Naive Bayes Algorithm?

Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable.

This definition is from Sklearn’s documentation. It’s a great explanation but let’s try for one last time.

Naive Bayes is one of the simplest and the most effective algorithm. It is an algorithm that works according to Bayes’ theorem. It’s also accurate, and reliable which that’s all we want from our model. The values of each classified category are independent from each other.

Although it is an algorithm that can be used for many different purposes, it performs well especially in the NLP.

The assumptions made by Naïve Bayes are generally not correct in real-world situations.

The formula I gave above was a formula that can only be calculated with one feature, x. What if we had more than one x features? Let’s check the new formula.

Not much had changed.

Using the naive conditional independence assumption that:

Source: Sklearn

For all i, this relationship is simplified to:

Since P(x1,…,xn) is constant given the input, we can use the following classification rule:

So much math! I know. To be exact, I don’t know or don’t fully understand all the equations up there but Bayes theorem. I hope it won’t cause a big problem. Finger crossed, and let’s continue!

One of the pros of the Naive Bayes classifier is that it can perform well with less train data and categorical variable compared to the numeric variables.

So, maybe we should see in action. This example has taken from this link.

We have some information about ‘Weather’, and ‘Play’ features. Table shows us if game played or not according to weather. Like you see above, first we need to convert this table into a frequency table. Now, we can see clear when is what. We can use Bayesian theorem equation.

The question is going to be: P(Yes|Sunny)? What is that mean? Try to understand for a moment. (Hint: I just gave you the answer above!)

Will players play if weather is sunny?

I’m pretty sure, you found out the answer! Well done!

Now, it’s time for turning this into a Bayesian theorem equation.

P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)

P(Sunny|Yes)’s answer is 3/9 = 0.33. From likelihood table, you can see that when weather is sunny 3 game played from 9 game.

P(Yes)’s answer is 9/14 = 0.64. 9 is the number of plays, 14 is the number of total plays whether played or not.

P(Sunny)’s answer is 5/14 = 0.36. 5 is the number of plays while weather is sunny, 14 is the number of total plays whether what weather is.

We got everything we need.

P(Yes|Sunny) = 0.33 * 0.64 / 0.36 = 0.60 which is high probability.

Yes, we did it.

This usually used for text classification, and if you have multiple classes.

Gaussian Naive Bayes

GaussianNB implements the Gaussian Naive Bayes algorithm for classification. The likelihood of the features is assumed to be Gaussian:

Source: Sklearn
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
gnb = GaussianNB()
y_pred = gnb.fit(X_train, y_train).predict(X_test)

Multinomial Naive Bayes

MultinomialNB implements the naive Bayes algorithm for multinomially distributed data, and is one of the two classic naive Bayes variants used in text classification (where the data are typically represented as word vector counts, although tf-idf vectors are also known to work well in practice). The distribution is parametrized by vectors θy=(θy1,…,θyn) for each class y, where n is the number of features (in text classification, the size of the vocabulary) and θyi is the probability P(xi∣y) of feature i appearing in a sample belonging to class y.

Source: Sklearn
import numpy as np
rng = np.random.RandomState(1)
X = rng.randint(5, size=(6, 100))
y = np.array([1, 2, 3, 4, 5, 6])
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
clf.fit(X, y)

Bernoulli Naive Bayes

BernoulliNB implements the Naive Bayes training and classification algorithms for data that is distributed according to multivariate Bernoulli distributions; i.e., there may be multiple features but each one is assumed to be a binary-valued (Bernoulli, boolean) variable. Therefore, this class requires samples to be represented as binary-valued feature vectors; if handed any other kind of data, a BernoulliNB instance may binarize its input (depending on the binarize parameter).

Source: Sklearn
import numpy as np
rng = np.random.RandomState(1)
X = rng.randint(5, size=(6, 100))
Y = np.array([1, 2, 3, 4, 4, 5])
from sklearn.naive_bayes import BernoulliNB
clf = BernoulliNB()
clf.fit(X, Y)

I took the definitions, and code samples from Sklearn because I think they are explanatory. I couldn’t find better words to describe them for you.

That’s all I’ve got for you. I’m hoping to be helpful to you to understand basics about Naive Bayes. All you have to do is practice, practice, and practice.

See you in next articles! Happy learning!

--

--