The Classical Bayesian Classifier: Continuous Uni-variate Case

Prateek Pandey
Analytics Vidhya
Published in
5 min readOct 6, 2021

When I say Classical Bayesian Classifier, I’m essentially avoiding the well-known Naive Bayes’ Classifier. The difference between the two is that the Naive version assumes the independence of predictors while the other one has no such reservations. However, in a uni-variate case, the discussion of independence is unwarranted, for there is only one predictor.

Uni-variate case assumes that there is only one predictor variable in the training data set.

Consider the data set, which has one predictor variable ‘Weight’, and a response variable ‘Response’. The response variable can either hold a value of ‘adult’ or ‘child’. The weight variable, on the other hand, is a continuous one and can hold values in the range from (0, ∞).

Let’s simulate our weight variable for just 20 values, such that it’s centered around a weight value of 46 (µ=46)with a standard deviation of 1.5(Ω=1.5). The histogram for the same is shown in figure 1. Also, let’s assign the values to the response variable against each of the weight value. The complete table is shown as table 2.

Figure 1: Histogram for the Weight variable
Table 1: Simulated Data for 20 weight values

Now, let’s plot the class-wise distribution of weight i.e. plotting all the weight values where response equals adult(Green) and child(Red) separately. This is shown in figure 2.

Figure 2: Class-wise PDFs and Decision Boundary

The blue vertical line shows the mean of the distribution of Weight for class ‘Child’. The cyan vertical line shows the same for the class ‘Adult’. The two bell-shaped curves represent the probability density functions (pdf) for the class-wise distribution of the random variable Weight.

For a weight value of 45.2(say), we’ll draw a perpendicular (vertical line) crossing the two density functions. Since the value on the y-axis for the density function on the left is more than that for the right, the class for the weight value will be Child (red). Similarly, for a weight value of 47, the class will be Adult (Green). Wait! Shouldn’t there be a value of weight that divides the two territories (Red and Green).

The value on the X-axis that divides the two territories (in binary classification) is called the Bayesian Decision Boundary.

The decision boundary is shown as an orange vertical line in figure 2. Contrary to the belief that a boundary has to be a line at the least, the boundary is actually a point (a value) in a one-dimensional setting. The orange vertical line instead of a point is shown only to capture the attention.

How do we obtain this decision boundary in a one predictor and two response classes setup?

Let’s denote the outcome of this classification problem as Y. Needless to say that the Y can take values 0 & 1 (Child or Adult) in a binary classification setting. The problem that our classifier solves is about finding a value of Y for a given data instance, i.e. the classifier is tasked to find the density value from the distribution of Y conditioned over X=x, which is neither intuitive nor practical to find. Here comes the Bayesian Theorem for the continuous predictor, which suggests that f(y|x) can be explained in terms of f(x|y). Figure 3 shows the Bayesian Theorem in context of continuous variable.

Figure 3: Bayesian Theorem for continuous predictor X

Here f(x) in the denominator is the marginal density for x and f(x)>0.

From figure 3, to obtain the decision boundary, we need to solve the following equation:

Figure 4: Equality to be solved for obtaining decision boundary

The denominators in figure 4 will be cancelled out, and if the number of data points in all the classes (0 & 1) are same, the multiplication of the two marginal densities on either side will become meaningless. Thus the equality that remains is shown in figure 5.

Figure 5: Final Equality to obtain the decision boundary

The Normal distribution function is expressed mathematically as shown in figure 6. k is the response class and sigma stands for standard deviation.

Figure 6: Normal function

Since the equation in figure 5 is conditional distribution of X on Y, therefore, we can put the f(x) from figure 6 in figure 5. Further, if we assume that both the response classes (0 & 1) share the same variance, then the resulting equation and further simplifications are given from figure 7 to figure 9.

Figure 7: Equation on putting f(x) from fig.6 in fig. 5

Taking log on the both sides of equation in figure 7, we get:

Figure 8: Simplifying equation in figure 7

On further simplifying equation in figure 8,

Figure 9: Simplifying equation in figure 8

The Bayesian decision boundary in a uni-variate system when predictor is continuous is given in figure 10. Thus, the decision boundary is the mean of the individual means, which is shown as orange vertical line in figure 2.

Figure 10: Bayesian Decision Boundary

Why The Bayesian Classifier is considered an unattainable gold standard?

Various reasons are there that explains why Bayesian classifier is unattainable:

(1) Since X is a continuous random variable, which theoretically have infinite possible values, finding the conditional distribution of response given infinite possibilities of X is impossible.

(2) We assumed that X is drawn from Gaussian (Normal) distribution within each class. There was no basis to make such an assumption. Even if our assumption was correct, then estimating mean, priori, and standard-deviation was still left. Yes, estimate! Because the data we generally have is just a sample from an infinite population; so the best we can do is estimate.

Appeal: If you like this article, don’t hesitate in clapping and sharing it further. Happy Machine Learning!!!

--

--

Prateek Pandey
Analytics Vidhya

I am a Research Enthusiast and a Teacher by choice. I also study Geo-Politics and Philosophy.