Log Plots, Kernel Density Estimation and Experimental Data

Praveen Jayasuriya
Analytics Vidhya
Published in
6 min readJul 6, 2020

--

Data was right. Data is always right.

I’ve been pretty busy working with some data from an experiment. I’m trying to fit a subset of the data to a model distribution/distributions where one of the functions follows a normal distribution (in linear space). Sounds pretty simple right?

Based on the domain knowledge of this problem, I also know that the data can probably be fitted by a mixture model and more specifically a Gaussian mixture model. Brilliant you say! Why not try something like,

from sklearn.mixture import GaussianMixturemodel = GaussianMixture(*my arguments/params*)model.fit(*my arguments/params*)

But try as I might I couldn’t find parameters that should model the underlying processes that generated the data. I had all sorts of issues from overfitting the data to nonsensical standard deviation values. Finally, after a lot of munging, reading and advice from my supervisor I figured out how to make this problem work for me and move onto the next step. In this post I want to focus on why the log domain can be useful in understanding the underlying structure of the data and can aid in data exploration when used in conjunction with kernel density estimation (KDE) and KDE plots.

--

--