Naive Bayes as explained through Ariana Grande lyrics

Published in

Analytics Vidhya

3 min readDec 14, 2019

What is Naive Bayes?

Naive Bayes is a machine learning classifier. More specifically it’s a probabilistic classifier, which means it predicts the probability that a feature is of each class, rather than just telling you the one class it most likely belongs to.

It’s most commonly used for text classification and detecting spam. You can thank Naive Bayes for keeping so many spam emails out of your inbox.

In this short introduction to Naive Bayes, we’ll be using songs and lyrics from Spotify’s most streamed female artist of the decade, Ariana Grande.

How does it work?

Fitting:

To fit the data, we’ll start by using the Bayes rule. A is the class and B is the feature. This calculates the probability that feature B is of class A. We’ll do this for every class and every feature.

Where does the naive part come into play? In order to simplify its computation, Naive Bayes makes the assumption that the features are conditionally independent given the class. This assumption, similar to the assumptions Ariana Grande makes in her song, ‘pete davidson’, about her then-relationship with the song’s namesake, is most likely not true — hence, where the naive part comes from.

Calculating the individual probabilities of each class for each feature depends on your data’s distribution. For instance, in text classification, it’s likely that the distribution will be multinomial distribution. In other cases, like when your features are continuous, the distribution may be Gaussian distribution.

What hyper parameters do we need? From looking at the Naive Bayes Scikit Learn documentation, it looks like we only need to feed in our training features and labels to fit the data.

Training:

“I see it, I like it, I want it, I got it”

-Arianna Grande quickly obtaining what she wants in ‘7 rings’

Naive Bayes needs very little training compared to other classifying methods. All we need to do before predicting is find the parameters for each of the features’ probability distributions. This can be done quite quickly- almost as fast as the time between Ariana Grande seeing something and her having it. Because of this, Naive Bayes is able to perform well with a large number of data points and high dimensional data points.

Predicting:

“One taught me love, one taught me patience, one taught me pain”

-Arianna Grande classifying what her ex boyfriends taught her in ‘thank u, next’

Instead of only giving you one classification, as Grande does in the quote above, Naive Bayes uses the probability of a given data point falling in a certain class to produce classifications. The class with the highest probability ultimately becomes the classification for the feature, but the probabilities for each classification are part of the Naive Bayes output.