Member-only story
Why Understanding the Data-Generation Process Is More Important Than the Data Itself
“The Book of Why” Chapters 5&6, a Read with Me series
During the early stages of infancy, our brains already learn to associate correlation with causation and try to find an explanation for everything happening around us. If a car behind us takes the same turns we do for a long time, we assume it's following us, which is a causal assumption. However, when we snap out of the movie mood, we then think we are properly just heading to the same destination — — a confounder. A common cause introduces a correlation between the two cars' movements. This vivid and relatable example that Pearl gives proves how the human brain works.
What about the correlations that we couldn’t fathom a reasonable explanation? Such as two diseases that are uncorrelated among the whole population but correlated among the hospitalized population. If you recall my last article that discussed different causal structures, it points out that conditioning the colliders (hospitalized) generates an explain-away effect that makes two uncorrelated variables spuriously correlated. In other words, the hospitalized population is not an accurate representation of the general population, and any observations made from this sample cannot be generalized.