Spam Detection and filtering with Naive Bayes Algorithm

Published in

Secure and Private AI Math Blogging Competition

4 min readAug 25, 2019

Introduction —

Probability is one of the major branches in mathematics. It is often used to solve many real-life situations. Mathematicians spent years building practical models to fit in the growing issues. Within this area, conditional probability represents the probability of an event happening in relation to the occurrence of another event. It assumes that no prior prediction can be made on the probability that an event is going to happen. This fundamental concept of conditional probability lies on the basis of Bayes Theorem. Bayes Theorem was first discovered by the Thomas Bayes in 1763.

Bayes Theorem allows mathematicians the ability to compute the unknown conditional probability of one pair of events given the known independent probability of each event and the reverse conditional probability of this pair of events.

Spam-

Spam refers to the use of electronic messaging systems to send out unrequested or unwanted messages.

Spam is often characterized by its advertising nature(i.e. Pornographic sites) or the fraud message it contains in order to deceive the users to acquire their confidential personal information.

Users often identify such emails by the frequent appearance of specific phrases such as “Please enter your bank account number here” or “Congratulations, you’ve won 100,000$!”. These patterns are recognizable by computer algorithms and are used as decisive factors to judge the nature of an email.

What is Naive Bayes Theorem -

It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

Bayes Theorem-

Bayes theorem is stated as the probability of the event A given B is equal to the probability of the event B given A multiplied by the probability of A upon the probability of B.

One key condition the above equation must satisfy is that neither (A) nor (B) should be equal to zero. This condition thus must be met for all circumstances when the Bayes Theorem is applied.

Besides, the alternative form of Bayes Theorem is generally encountered when looking at two competing statements or hypotheses:

P(A) is the corresponding probability of the initial degree of belief against A, where P(A’) = 1 — P (A).

For some partition {Ai } of the sample space, the extended form of Bayes Theorem is:

How Naive Bayes Algorithms works?-

let’s take some trained datasets:

so, we have some data some are Spam and some are Ham(which is not Spam)

here, the probability of Spam and Ham:

here, the probability of every word in the Spam and Ham:

Let’s assume New email is “review us now”(now is the new word).
now, let’s check that the above email is Spam or Ham.

so, here we calculate the conditional probability :

Now, we apply Bayes theorem:

here, you see the probability of email happening to be Spam is very less{0.1229(approx.)}.

So, the Algorithm predicts that the email {“review us now”} is not Spam(Ham).

Advantages -

● Very simple, and easy to use.
● Need less training data.
● Makes Probabilistic predictions.
● Handles continuous and discrete data.
● It is a generative model, i.e, it can make predictions even if some feature is missing by altering decision rules.

Disadvantages -

● A subtle issue with Naive-Bayes Classifier is that if you have no occurrences of a class label and a certain attribute value together then the frequency-based probability estimation will be zero.
● A big data set is required for making reliable predictions of the probability of each class. We can use this with small data sets but the precision will be altered.
● Attribute independence.

Other Applications -

● Multi-class Prediction
● Real-time Prediction
● Recommendation System

If you ❤ ︎this article, Just give a clap.
please connect with me on LinkedIn, Github, Twitter