What we can learn from the empirical distribution

We want to learn about some variable of interest. We want to better understand the phenomena captured in this dependent/outcome/predicted variable using data. One first step we may take is to see our variable of interest in a histogram or density plot (which shows the variable’s empirical distribution).

Classical data analysis advice suggests examining your variable’s distribution to identify which analysis technique would be a good match for your variable. An analysis technique is a good match when its mathematical foundation is based on assumptions that appear to hold in your variable of interest’s empirical distribution. And we are advised to use well matched analysis techniques so the conclusions from our analysis are grounded in theoretical validity. But we can learn more by examining the histogram or density plot than is traditionally taught.

We can use the distribution to infer properties of the phenomena captured in our variable of interest. If our variable has a Power Law distribution then the phenomena it captures has amplification (positive) feedback properties, if the variable has a Normal distribution then the phenomena has counteraction (negative) feedback properties, and if the variable has a Uniform distribution then the phenomena has no feedback. Let me expand.

If the variable has a distribution that looks like a Power Law, we are dealing with a phenomenon that has amplification feedback. Increases increase the chance of further increases. The variable captures a rich get richer dynamic so we know that more helps with getting more. For example, if our variable is income. More income leads to more income because of compounding interest, investing in further income generating things like education and increasing access to people and ideas that help you make more money. Phenomenon with network characteristics have amplification feedback. Book reads follow a Power Law because when one person reads a book, that person then may discuss the book with others which increases the chance someone else reads the book. Learning effects also can serve as amplification feedback so a variable with a Power Law distribution can reflect learning — something in which gaining exposure leads to more exposure because lessons from the first exposure help with attracting more exposure.

If the variable has a distribution that looks Gaussian/Normal, we are dealing with a phenomenon that has counteracting feedback. Deviations up or down push back towards a set point (the distribution’s mean). The variable captures a phenomenon with forces pushing towards equality across observations so a deviation up or down does not push it farther in that direction (as in Power Laws), but instead, pushes it in the opposite direction. Daily calories consumed has a Normal distribution. If I pig out at a BBQ today, I will eat less tomorrow. We are dealing with a phenomenon with some external constraint. The external constraint may be biologically derived (as in our daily caloric intake), due to the laws of physics, or reflect resource limitations or system design.

If the variable has a distribution that looks Uniform, we are dealing with a phenomenon that has no feedback. Each realized value of our variable (observation) has no correlation with the next. The phenomenon is completely random and so no feedback is used to inform how it changes. What occurred today has no correlation with what occurred the day before or what will occur tomorrow.


Originally published at ablifeing.blogspot.com.