Member-only story
Statistical Tests Won’t Help You to Compare Distributions
Forget p-values, and come to know “Standardized Wasserstein Distance”: a practically useful measure of the difference between distributions.
One of the most recurring questions in the world of data, from machine learning to business analysis, from finance to medical research is:
“How different are these variables among these groups?”
I bet you have been there before. You have a dataset and two (or more) groups of customers, say customers from US and customers from Germany, and you have recorded some variables, such as age, purchase frequency, and average purchase amount.
You would like to know which variable is better able to set apart American customers from German customers. Or, equivalently, which variable is more different between American and German customers.
To do that, you need a way to measure the difference between distributions, e.g. how different is the age of German customers from the age of American customers, and so on.