How Kernel density estimation works part1(Machine Learning)

Monodeep Mukherjee
2 min readJan 5, 2023
Photo by Victor on Unsplash
  1. Fully Data-driven Normalized and Exponentiated Kernel Density Estimator with Hyvärinen Score(arXiv)

Author : Shunsuke Imai, Takuya Koriyama, Shouto Yonekura, Shonosuke Sugasawa, Yoshihiko Nishiyama

Abstract : We introduce a new deal of kernel density estimation using an exponentiated form of kernel density estimators. The density estimator has two hyperparameters flexibly controlling the smoothness of the resulting density. We tune them in a data-driven manner by minimizing an objective function based on the Hyvärinen score to avoid the optimization involving the intractable normalizing constant due to the exponentiation. We show the asymptotic properties of the proposed estimator and emphasize the importance of including the two hyperparameters for flexible density estimation. Our simulation studies and application to income data show that the proposed density estimator is appealing when the underlying density is multi-modal or observations contain outliers

2. Robustify Transformers with Robust Kernel Density Estimation(arXiv)

Author : Xing Han, Tongzheng Ren, Tan Minh Nguyen, Khai Nguyen, Joydeep Ghosh, Nhat Ho

Abstract : Recent advances in Transformer architecture have empowered its empirical success in various tasks across different domains. However, existing works mainly focus on improving the standard accuracy and computational cost, without considering the robustness of contaminated samples. Existing work has shown that the self-attention mechanism, which is the center of the Transformer architecture, can be viewed as a non-parametric estimator based on the well-known kernel density estimation (KDE). This motivates us to leverage the robust kernel density estimation (RKDE) in the self-attention mechanism, to alleviate the issue of the contamination of data by down-weighting the weight of bad samples in the estimation process. The modified self-attention mechanism can be incorporated into different Transformer variants. Empirical results on language modeling and image classification tasks demonstrate the effectiveness of this approac

--

--

Monodeep Mukherjee

Universe Enthusiast. Writes about Computer Science, AI, Physics, Neuroscience and Technology,Front End and Backend Development