Member-only story
Fairness and Bias, Notes from Industry
Race and Ethnicity in Health Data Science
Why it’s important and how we should approach it
It’s undeniable that considering race or ethnicity (abbreviated as R/E; used as a singular noun in this article, although the statements in this article will refer to race and ethnicity collectively) is important in quantitatively studying healthcare outcomes. Ask any respectable statistician/epidemiologist/data scientist and they’ll tell you at least this much! I think while the understanding of the importance of R/E is ubiquitous, we can always strive to build a stronger fundamental vocabulary of why it’s important. I wanted to write an article that aims to summarize conclusions from (fairly mature) literature about R/E in model-building. Specifically, I wanted to briefly cover R/E in explanatory and predictive contexts (for more info on the difference, check out my previous article on the topic!).
Below is an outline of this article. While I’ve ordered the topics based on my personal progression of understanding R/E (1. what does this variable represent 2. how do we record this variable’s measurement 3. why this variable is important 4. how do we make statistical conclusions about this variable), please feel free to skip around to a topic that most interests you!