TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Photo by Ryoji Iwata on Unsplash

Member-only story

Why Understanding the Data-Generation Process Is More Important Than the Data Itself

“The Book of Why” Chapters 5&6, a Read with Me series

Zijing Zhu, PhD
TDS Archive
Published in
15 min readDec 7, 2023

--

During the early stages of infancy, our brains already learn to associate correlation with causation and try to find an explanation for everything happening around us. If a car behind us takes the same turns we do for a long time, we assume it's following us, which is a causal assumption. However, when we snap out of the movie mood, we then think we are properly just heading to the same destination — — a confounder. A common cause introduces a correlation between the two cars' movements. This vivid and relatable example that Pearl gives proves how the human brain works.

What about the correlations that we couldn’t fathom a reasonable explanation? Such as two diseases that are uncorrelated among the whole population but correlated among the hospitalized population. If you recall my last article that discussed different causal structures, it points out that conditioning the colliders (hospitalized) generates an explain-away effect that makes two uncorrelated variables spuriously correlated. In other words, the hospitalized population is not an accurate representation of the general population, and any observations made from this sample cannot be generalized.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Zijing Zhu, PhD
Zijing Zhu, PhD

Written by Zijing Zhu, PhD

Ph.D. in Economics | Data Scientist @Cisco | Top 1000 Writer in Medium| Lifetime Learner | https://www.linkedin.com/in/zijingzhu/

Responses (4)