Before starting let’s get some background on Estimators, they're classified into two classes
Parametric make assumptions about the population from which a sample of data is drawn. Often this assumption is that the population is normally distributed, i.e. bell-shaped. This assumption allows the development of a theory that allows us to draw inferences about the population based on a sample taken from it.
The other family of estimators is Non-Parametric this set of distribution makes no distributional assumptions no fixed structure and depends upon all the data points to reach an estimate. Kernel density estimators belong to this class.
So why Kernel Density Estimation let us see how histograms are just not sufficient.
Histograms are not smooth, depend on the width of the bins and the endpoints of the bins, This is where kernel density estimators alleviate the problem.
let’s see how histogram are affected by bins
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats.distributions import norm# Plotting a normal distribution with different bins
mu, sigma = 0, 0.1 # mean and standard deviation
s = np.random.normal(mu, sigma, 1000)#plotting the different bins
from matplotlib.pyplot import figure
figure(num=None, figsize=(10, 8), dpi=80, facecolor='w', edgecolor='k')
So we see in the above visualization how bin changes the normalization look
So how do we overcome this?
To remove the dependence on the endpoints of the bins, kernel estimators center a kernel function at each data point. We place a kernel function on every data point to get the density estimates. Just like in high school getting the value of the function on at a given point of x
y = f(x)
Kernel Function typically has these following properties
- Everywhere non-negative: K(x)≥0 ∀ x∈X
- Symmetric : K(x) = K(-x) ∀ x∈X
- Decreasing : K`(x) ≤ 0 ∀ x >0
The quality of a kernel estimate depends less on the shape of the K than on the value of its bandwidth h. It’s important to choose the most appropriate bandwidth as a value that is too small or too large is not useful.
x = np.concatenate([norm(-1, 1.).rvs(400),norm(1, 0.3).rvs(100)])
sns.kdeplot(x,data2=None ,bw=1,color="red",label="bw: 0.2")
sns.kdeplot(x,data2=None ,bw=.5,color ="blue",label="bw: 0.5")
sns.kdeplot(x,data2=None ,bw=.3,color="green",label="bw: 0.3")
sns.kdeplot(x,data2=None ,bw=.1,color="grey",label="bw: 0.1")
sns.kdeplot(x,data2=None ,bw=.05,color="grey",label="bw: 0.05")
The smoothing bandwidth h plays a key role in the quality of KDE. Here is an example of applying different h to the dataset we see that when h is too small (the gray curve), there are many wiggly structures on our density curve this is under smoothing. On the other hand, when h is too large (the yellow curve), we see that the two bumps are smoothed out. This situation is called over smoothing–some important structures are obscured by the huge amount of smoothing.
Bandwidth selection methods, univariate case
The natural way for choosing ℎ is to plot out several curves and choose the estimate that best matches one’s prior (subjective) ideas, However, this method is not practical in high-dimensional data.
Maximum likelihood cross-validation
Reference to a standard distribution
The idea of Kernel Density Estimators is to give you an idea about the distribution.
I am an Artificial Intelligence Developer at Wavelabs.ai. We at Wavelabs help you leverage Artificial Intelligence (AI) to revolutionize user experiences and reduce costs. We uniquely enhance your products using AI to reach your full market potential. We try to bring cutting edge research into your applications. Have a look at us.
You can reach me out at LinkedIn