The Classical Bayesian Classifier: Continuous Uni-variate Case
When I say Classical Bayesian Classifier, I’m essentially avoiding the well-known Naive Bayes’ Classifier. The difference between the two is that the Naive version assumes the independence of predictors while the other one has no such reservations. However, in a uni-variate case, the discussion of independence is unwarranted, for there is only one predictor.
Uni-variate case assumes that there is only one predictor variable in the training data set.
Consider the data set, which has one predictor variable ‘Weight’, and a response variable ‘Response’. The response variable can either hold a value of ‘adult’ or ‘child’. The weight variable, on the other hand, is a continuous one and can hold values in the range from (0, ∞).
Let’s simulate our weight variable for just 20 values, such that it’s centered around a weight value of 46 (µ=46)with a standard deviation of 1.5(Ω=1.5). The histogram for the same is shown in figure 1. Also, let’s assign the values to the response variable against each of the weight value. The complete table is shown as table 2.
Now, let’s plot the class-wise distribution of weight i.e. plotting all the weight values where response equals adult(Green) and child(Red) separately. This is shown in figure 2.
The blue vertical line shows the mean of the distribution of Weight for class ‘Child’. The cyan vertical line shows the same for the class ‘Adult’. The two bell-shaped curves represent the probability density functions (pdf) for the class-wise distribution of the random variable Weight.
For a weight value of 45.2(say), we’ll draw a perpendicular (vertical line) crossing the two density functions. Since the value on the y-axis for the density function on the left is more than that for the right, the class for the weight value will be Child (red). Similarly, for a weight value of 47, the class will be Adult (Green). Wait! Shouldn’t there be a value of weight that divides the two territories (Red and Green).
The value on the X-axis that divides the two territories (in binary classification) is called the Bayesian Decision Boundary.
The decision boundary is shown as an orange vertical line in figure 2. Contrary to the belief that a boundary has to be a line at the least, the boundary is actually a point (a value) in a one-dimensional setting. The orange vertical line instead of a point is shown only to capture the attention.
How do we obtain this decision boundary in a one predictor and two response classes setup?
Let’s denote the outcome of this classification problem as Y. Needless to say that the Y can take values 0 & 1 (Child or Adult) in a binary classification setting. The problem that our classifier solves is about finding a value of Y for a given data instance, i.e. the classifier is tasked to find the density value from the distribution of Y conditioned over X=x, which is neither intuitive nor practical to find. Here comes the Bayesian Theorem for the continuous predictor, which suggests that f(y|x) can be explained in terms of f(x|y). Figure 3 shows the Bayesian Theorem in context of continuous variable.
Here f(x) in the denominator is the marginal density for x and f(x)>0.
From figure 3, to obtain the decision boundary, we need to solve the following equation:
The denominators in figure 4 will be cancelled out, and if the number of data points in all the classes (0 & 1) are same, the multiplication of the two marginal densities on either side will become meaningless. Thus the equality that remains is shown in figure 5.
The Normal distribution function is expressed mathematically as shown in figure 6. k is the response class and sigma stands for standard deviation.
Since the equation in figure 5 is conditional distribution of X on Y, therefore, we can put the f(x) from figure 6 in figure 5. Further, if we assume that both the response classes (0 & 1) share the same variance, then the resulting equation and further simplifications are given from figure 7 to figure 9.
Taking log on the both sides of equation in figure 7, we get:
On further simplifying equation in figure 8,
The Bayesian decision boundary in a uni-variate system when predictor is continuous is given in figure 10. Thus, the decision boundary is the mean of the individual means, which is shown as orange vertical line in figure 2.
Why The Bayesian Classifier is considered an unattainable gold standard?
Various reasons are there that explains why Bayesian classifier is unattainable:
(1) Since X is a continuous random variable, which theoretically have infinite possible values, finding the conditional distribution of response given infinite possibilities of X is impossible.
(2) We assumed that X is drawn from Gaussian (Normal) distribution within each class. There was no basis to make such an assumption. Even if our assumption was correct, then estimating mean, priori, and standard-deviation was still left. Yes, estimate! Because the data we generally have is just a sample from an infinite population; so the best we can do is estimate.
Appeal: If you like this article, don’t hesitate in clapping and sharing it further. Happy Machine Learning!!!