# Introduction to Kernel density estimation.

Before starting let’s get some background on Estimators, they're classified into two classes

1. Parametric
2. Non-Parametric

Parametric make assumptions about the population from which a sample of data is drawn. Often this assumption is that the population is normally distributed, i.e. bell-shaped. This assumption allows the development of a theory that allows us to draw inferences about the population based on a sample taken from it.

The other family of estimators is Non-Parametric this set of distribution makes no distributional assumptions no fixed structure and depends upon all the data points to reach an estimate. Kernel density estimators belong to this class.

So why Kernel Density Estimation let us see how histograms are just not sufficient.

Histograms are not smooth, depend on the width of the bins and the endpoints of the bins, This is where kernel density estimators alleviate the problem.

let’s see how histogram are affected by bins

`#Importing librariesimport pandas as pdimport numpy as npimport seaborn as snsimport matplotlib.pyplot as pltimport pylabfrom scipy.stats.distributions import norm# Plotting a normal distribution with different binsmu, sigma = 0, 0.1 # mean and standard deviations = np.random.normal(mu, sigma, 1000)#plotting the different binsfrom matplotlib.pyplot import figurefigure(num=None, figsize=(10, 8), dpi=80, facecolor='w', edgecolor='k')plt.hist(s,bins=10,label="10")plt.hist(s,bins=50,label="50",color="green")plt.hist(s,bins=300,label="300",color="orange")plt.hist(s,bins=500,label="500",color="white")plt.show()`

So we see in the above visualization how bin changes the normalization look

So how do we overcome this?

To remove the dependence on the endpoints of the bins, kernel estimators center a kernel function at each data point. We place a kernel function on every data point to get the density estimates. Just like in high school getting the value of the function on at a given point of x

y = f(x)

## Kernal function

Kernel Function typically has these following properties

1. Everywhere non-negative: K(x)≥0 ∀ x∈X
2. Symmetric : K(x) = K(-x) ∀ x∈X
3. Decreasing : K`(x) ≤ 0 ∀ x >0
`sns.kdeplot(x,data2=None,bw=.4,color=”yellow”,label=”gaussian”,kernel=”gau”) sns.kdeplot(x,data2=None,bw=.4,color=”black”,label=”biw”,kernel=”biw”) sns.kdeplot(x,data2=None,bw=.4,color=”red”,label=”cos”,kernel=”cos”) sns.kdeplot(x,data2=None,bw=.4,color=”green”,label=”epa”,kernel=”epa”)sns.kdeplot(x,data2=None,bw=.4,color=”blue”,label=”tri”,kernel=”tri”)sns.kdeplot(x,data2=None,bw=.4,color=”green”,label=”triw”,kernel=”triw”)`

The quality of a kernel estimate depends less on the shape of the K than on the value of its bandwidth h. It’s important to choose the most appropriate bandwidth as a value that is too small or too large is not useful.

`x = np.concatenate([norm(-1, 1.).rvs(400),norm(1, 0.3).rvs(100)])sns.kdeplot(x,data2=None ,bw=2,color="yellow",label="bw:2")sns.kdeplot(x,data2=None ,bw=1,color="red",label="bw: 0.2")sns.kdeplot(x,data2=None ,bw=.5,color ="blue",label="bw: 0.5") sns.kdeplot(x,data2=None ,bw=.3,color="green",label="bw: 0.3")sns.kdeplot(x,data2=None ,bw=.1,color="grey",label="bw: 0.1")sns.kdeplot(x,data2=None ,bw=.05,color="grey",label="bw: 0.05")plt.legend();`

The smoothing bandwidth h plays a key role in the quality of KDE. Here is an example of applying different h to the dataset we see that when h is too small (the gray curve), there are many wiggly structures on our density curve this is under smoothing. On the other hand, when h is too large (the yellow curve), we see that the two bumps are smoothed out. This situation is called over smoothing–some important structures are obscured by the huge amount of smoothing.

# Bandwidth selection methods, univariate case

Subjective choice

The natural way for choosing is to plot out several curves and choose the estimate that best matches one’s prior (subjective) ideas, However, this method is not practical in high-dimensional data.

Maximum likelihood cross-validation

Reference to a standard distribution

Conclusion

The idea of Kernel Density Estimators is to give you an idea about the distribution.

I am an Artificial Intelligence Developer at Wavelabs.ai. We at Wavelabs help you leverage Artificial Intelligence (AI) to revolutionize user experiences and reduce costs. We uniquely enhance your products using AI to reach your full market potential. We try to bring cutting edge research into your applications. Have a look at us.

You can reach me out at LinkedIn