Combatting Data Bias: Goal, Data, Feature and Model Bias

SingleStone
Jul 23, 2019 · 5 min read

Combatting data bias requires more than an understanding of where it can exist, we should also be capable of identifying its many forms. In my last post, Data is Rarely Neutral: Bias in the Practice of Data, I stressed the ubiquitous nature of data, providing the analogy, data is to information as oxygen is to air. With this framework, we should strive to create clean air, not air that is toxic or polluted. And we do that by fighting four kinds of bias: goal bias, data bias, feature bias and model bias.

Goal Bias.

Goal bias can be the very first mis-step in human management. For example, a goal such as “automate the pre-selection criteria of candidates for a job posting” may unwittingly overlook the diamonds in the rough that we would have uncovered via personal interactions, such as a phone screen or in-person interview. Another example: a goal such as measuring teacher efficacy might inaccurately tag a teacher with low-scoring kids in a high-poverty district as ineffective without considering the socioeconomic factors that might also contribute to low test scores. We mitigate goal bias by ensuring the goal is well-formed and well-intentioned along the lines of human pursuit, rather than human management. It’s important to note that in some cases this will run afoul of being efficient and possibly, arguably, profitable. This is not a one-time effort; goal bias requires us to constantly examine the questions we’re asking our data and what outcomes can be derived.

Data Bias: Goal, Data, Feature and Model Bias

Data Bias.

This one falls in the category of “you don’t know what you don’t know.” However, I find this to be a bit of a cop out, especially when data scientists advocate “letting the math do what math does.” Together, these two penchants can yield catastrophic results such as zero Black/African faces being identified as beautiful in a model designed to determine archetypes of beauty. How does that happen, you ask? Easily. If we train a model with data that is already biased (i.e. the test data does not include Black/African faces, the resulting algorithm will be biased, too). So, if mathematics is our super power, we must demonstrate dominion over data modeling (instead of the other way around).

Feature Bias.

Feature bias offers more cautionary tales of the danger of “letting the math do what math does.” Take for example, credit underwriters Zest Finance, who identified a new risk: people who fill out loan applications in all caps are much higher risk. I want to suppose they also took into account that an entire generation is now devoid of learning in the etiquette and meaning of handwriting past the 3rd grade, but I am sure there was no mitigation of bias on this particular feature (mind you not even one to proclaim it irrelevant.) I am sure some super smart designer used antiquated psychological handwriting analysis to skew bias in the direction of including handwriting as a feature in determining credit worthiness NOT considering that the youngest two adult generations are no longer taught handwriting beyond formation.

At the last conference I attended, someone asked a question about how we should deal with a feature that we know has a greater or lesser impact. The facilitator answered, “Allow the math. . . ” Inherent in this answer is implicit bias that represents and doubles down on systemic and institutional biases. In other words: bias in, bias out. Humans naturally employ bias as a means for survival, but it is when these biases coalesce into a systemic and/or institutional bias that we need to address and mitigate. It will take sincere improvement efforts in systems thinking and observation of societal constructions to ferret out dangerous feature bias. And, when we do discover bias, how will we deal with it?

Model Bias

We all know that not all models tell the story we want to tell, nor do they all yield the answer to the actual question asked. Furthermore, in some cases we do not have a clear line of sight into how the model factors the data. It would be great if these “black box” models could behave responsibly, like humans, but that’s not going to happen. In many regulated industries, such as insurance and finance, this is an issue. And as more and more industries adopt machine learning, we will be asked to explain how and why a model works the way it does, regardless of regulation.

Demystifying the black box, cracking it open and revealing its innards, is one of the biggest challenges we face. And it’s important for two reasons: we want to know how and why that particular model was designed for 1) efficacy and 2) improvement.

While I applaud these efforts, I would like to see us put more emphasis on avoiding the bias in the first place. That’s a tough sell because at this stage in the game, we incentivize speed to market over quality of outcomes. Instead, I’d like to see an incentivization plan that allows us to profit from the positive impact of the models we produce. But for now, we must consider the quality of the outcome to the best of our ability within the time constraints we’re given.

So, we ask ourselves: does the model include variables with explicit or implicit bias? Are we extending a self-informing feedback loop that exacerbates a systemic or institutional bias? And, if so, how are we mitigating those biases. Are we creating opposing models? For example, if we are using data to model criminality, are we also modeling for policing on behalf of the citizenry? Are we even testing for that? Do we know HOW to test for that?


I’ve given us a lot to think about. We, the data geeks and Air Benders, will shape the societal norms of tomorrow. I believe we have a responsibility to “bend” the data to facilitate innovation that will foster societal good. We are not modeling for the masses; we are the masses who are modeling our future.

Vida Williams is the Data & Machine Learning Solution Lead at SingleStone. She is an accomplished data scientist who, in addition to leading data solutions at SingleStone, utilizes her expertise to drive social and economic change in communities.

SingleStone

Written by

We’re a technology consulting company. We help businesses keep up with tech so they can keep up with their customers.

More From Medium

Also tagged Machine Learning

Also tagged Machine Learning

Classifier calibration

Also tagged Data Bias

Also tagged Data Science

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade