Implementing a Gaussian Naïve Bayes Classifier from Scratch
--
“Why is Google censuring me?!” Claire asked (true story). Sure, she’s always been a prolific emailer, but she is no scammer — and she assures me her days as a Nigerian prince are long since over. So why did Gmail suddenly lock her account today as if she was a spam-sending super-bot?
My answer, after building this: “Most likely? Honestly, you’re probably doing some pretty unusual things in your email… unusual enough that your Gmail account is a very clear outlier compared to almost any other human user, so Gmail is flagging you as “probably a spam account or bot.” Maybe some combination of how many emails you’re sending, to how many people, with what type of chain-email-like language in them?”
And most likely, Thomas Bayes was involved.
Naïve Bayes: A Quick Intro
Naïve Bayes (a.k.a. Independence Bayes) classifiers are a category of supervised probabilistic models that, despite relying on pretty simple underlying assumptions, are still widely used in today’s world. Because they are very fast even when working with huge datasets, and perform surprisingly well despite said simplistic assumptions, they are consulted and/or used for many critical real-world tasks:
- financial loan decisions
- predicting health condition/disease likelihoods (including COVID-19) in healthcare and insurance
- text and document classification
- natural language (NLP) tasks like sentiment analysis in reviews… and
- most notably, email spam detection.
At its root, Naïve Bayes analyzes conditional probabilities to make its predictions, relying on the same Bayes’ Theorem you learned about in high school:
Or, as applied our current case of deciding whether an email is spam or not:
How Naïve Bayes Works: Bayesian Probabilities with Multiple Classes and Many Features
Naïve Bayes models ultimately take any new, previously unseen data point such as a new email, and (1) calculate the probabilities of that data point (its particular set of features) separately as if it belonged to each different class, then (2) choose the most probable class based on which probability is highest. Because the Bayes denominator above is a constant that stays the same for every class — e.g., P(email characteristics) is the same regardless of whether the class is “spam” or “not spam” — it does not help us compare probabilities.
As such, we are really just comparing the Bayes Theorem numerator across every possible class. We do this by doing the following:
(1) Calculate the independent overall probabilities p(class_i) of each class (the class priors) from the provided training data.
(2) Calculate the Class-conditional Probability Distributions p(feature_n | class_i) for Each Feature-Class Combination.
(3) Calculate the “Likelihood” Probability that a specific Input Data Point Belongs to Each Class.
(4) Choose the Class with the Maximum Probability, and Predict that Class for the Given Input Data Point.
Implementation: Building a Gaussian Naïve Bayes Classifier from Scratch:
Step 1: Calculate the Class Priors, the Raw Probabilities p(class) of Each Class:
Step 2: Get the Class-conditional Probability Distributions p(feature_n | class_i) for Each Feature-Class Combination:
Step 3: Calculate the “Likelihood” Probability that the Input Data Point Belongs to Each Class
Step 4: Choose the Class with the Maximum Probability, and Predict that Class for the Given Input Data Point