Thoughts on Bias in Algorithmic Decision Making in Light of AppleCard and Goldman Story

4 min readNov 14, 2019

By Stas Cherkassky and Tobias Schaefer

Summary

Detecting and eliminating bias is one of the most pressing challenges for algorithmic decision making. While the problem is not new at all, it is not easy to address, and therefore, it has often been ignored in the past. The world, however, is changing and numerous new cases prove that failing to properly address the issue of bias can be dangerous and costly. The public becomes increasingly aware of the potential unfairness of AI-powered decision systems. This week’s news related to Goldman Sachs and the Apple Card is just another demonstration of this fact. It is fair to assume that Goldman Sachs’s experts did not want to introduce any bias. But simply not looking at gender is not enough — modern ML systems can often develop bias even when the protected variable is absent.

Companies need to be able to provide satisfying answers when claims regarding potential biases are raised by consumers and regulators. And we can expect that, soon, regulators will focus on bias and require algorithmic accountability.

Introduction

There is no clear and agreed-upon definition of bias. While bias in decision systems can be defined in a variety of ways, the following illustrates the basic idea: The decision system has a bias if equally qualified candidates do not get equal opportunity. For example, in the context of credit cards, we expect that female and male applicants are offered the same credit (limit, interest, etc.) when they present equal qualifications.

Anyone who cares about unfair biases should be able to answer these questions:

1. How do I detect unfair bias in my decision system?

2. If bias is there, how can I mitigate its effects?

3. Can I explain any single decision and demonstrate my model fairness in general?

While the first question can be answered within the framework of any model, (including even black-box algorithms), it is much more complicated to respond to the other two.

Detection

If Goldman Sachs’s model indeed suffers from gender bias, we are pretty sure it was not intentional. Most likely, their decision system did not take gender as input, and the applicants are probably not even asked for gender when applying for the card. While this might seem appropriate at first glance, the surprising realization is that ignoring the gender is not sufficient to eliminate bias. This is because many other variables might be correlated with gender and the resulting decision system still might not offer equal opportunity to women and men.

In addition, if one wants to make sure that their decision system has no gender bias, they actually should have gender information. Without it, it is hard to compare model performance for different genders, which is necessary to measure the bias. This very story started when a few married couples noticed that husband and wife with apparently similar financial qualifications were offered different credit limits. It is a kind of “validation by customer” that one generally should avoid.

It’s still to be determined whether Goldman Sachs’s model really has a gender bias. Some people blame the black-box nature of the underlying algorithm for both the bias itself and for the problem to measure it. However, we do not know (yet) if Goldman Sachs’s model is indeed a black-box. Also, contrary to what some people think, bias can be measured even for black-box models. That’s because many bias definitions look at the inputs, model decision and (sometimes) at the actual outcome, not the decision process. We wrote about it in more detail in our previous post.

Mitigation

While detecting bias is critical, mitigating it is a much harder challenge, especially when using black-box models. There are some advanced tools to approach the problem, many of them involve complex generation of synthetic data for model training. Other methods involve essentially different models for different classes. While it can fix the bias metric, it leads to different decisions for applicants which are identical apart from gender, something that can by itself look as unfair discrimination.

Over recent years, classes of transparent algorithms have been developed that offer the same level of accuracy as black-box approaches, and yet are able to provide explanations that are human-interpretable. One example of such a class of transparent (or white box) algorithms is a probabilistic rules engine — which is used in Stratyfy’s approach to building decision systems. Due to the global transparency of the decision engine, it becomes a simple task to not only identify bias but also to eliminate it from the decision systems such that all applicants are provided with equal opportunity.

Decision explanation and model fairness demonstration.

If Goldman Sachs used an interpretable ML model for its AppleCard decisions, they would have been able to not just demonstrate that the model is not biased in general (i.e. statistically on historical data), but also explain any individual decision, by showing which variables or rules contributed (and how) to the final score.

Today it is Goldman Sachs, tomorrow it can be any bank, lender, or insurance company. For many of them, having a proper solution for bias will soon be not a nice-to-have addition, but a business necessity.