Automating Financial services — The danger of Machine humanity

Published in

Tech at Holvi

8 min readDec 18, 2017

We’ve all heard about the pending AI apocalypse by now. Either from Elon Musk[1] or through countless news articles with a very tacky stock images (usually featuring images in shades of blue with cgi robots).

I do not wish to disprove such claims and worries, but I do want to point out that some negative effects of Artificial Intelligence systems might already be here — and those flaws are very much human: discrimination.

Let’s start with the basics:

Discriminate
1- Recognise a distinction; differentiate.
2- Make an unjust or prejudicial distinction in the treatment of different categories of people, especially on the grounds of race, sex, or age.

The first definition for discriminate is a fundamental concept in data science and machine learning — you want to find what is different in your dataset instances, in order to extrapolate to new elements (e.g. classification) or to help you understand your population (e.g. clustering).

The second one is a flawed, misguided or ill-intended execution of the previous.

It is an innate human capability to discriminate. Most people can distinguish that red is different from blue, rough is different from smooth, a bright place is different from a dark place — This is the application of first definition of discriminate we mentioned above.

Labelling the things we know how to discriminate, and their associated meanings is, on the other hand, a much deeper and complicated problem.

We know red is different for blue, but what do we call red[2]? We learn early on, according to our cultural environment, the label red and that the colour is associated with passion, love, and aggressiveness. We can also argue that some of these connotations are evolutionary — we associate a positive feeling to smooth and light, and negative connotations to dark and rough — at night you can’t see predators and approaching dangers, rough surfaces hurt.

This complexity of labels and their connotations leads to the second definition of discriminate. We see that light skin is different from dark skin, and we label people into races. Worse than that, we extrapolate meanings that have caused great damage to people and communities for centuries — connotations that are mostly taught and cultural!

I would argue that machines are just the same. We give machine learning algorithms a set of features (values that we think are relevant for a problem) and they will learn to discriminate — A cat is different from a dog, light skin is different from dark skin, long hair is different from short hair, male is different from female. But as we discussed before, these labels can be very subjective. If instead we try to discriminate between good and bad, superior or inferior, friend or foe, things get complicated really fast! Especially if the training data of these systems is riddled with human biases.

Let’s look at the very specific historical case of Financial Services

Redlining

Redlining [2] is the practice of businesses denying services (historically banking, lending and insurance services) based solely on the area of residence of the customer: The name coming from the practice of drawing on a map, in red, the boundary between the “good” and “bad” areas.

The first major problem about this practice, if not obvious by now, is that an address, by itself, should not be a predictor for life and business critical decision making

Consider this: If it was found that living in an apartment with an elevator was correlated with deaths by heart attack, should your insurance company be able to raise your health insurance premium based on your address? — Clearly it’s not the elevator that causes your heart failure, but the fact that sedentary people will chose to live in a building with an elevator.

Let’s imagine that a certain area has lower income and lower property value, making it impossible for any individual or business to access credit. Thriving businesses would be forced to move away, building maintenance would likely be impossible, driving zone value even lower, leading to even lower probability of credit access — a Chicken and egg problem.

This becomes an ethical problem if we consider that many of these areas were populated by racial and cultural minorities. And it begs the question — were these areas being profiled as low property value because of the demographic and cultural environment?

So, from a economical perspective if two people would have the exact same factors for success (eg. same probability of credit default, risk adjusted return), they should get the same access to financial services that are sometimes essential in life. So how did this happen?

Heuristics

Let’s, for the sake of this thought experiment, assume that even though this practice started in a time marked by racial inequality, the reasons for classifying a neighbourhood as “Hazardous” were not racial or cultural.

Redlining (left) vs “Ideal *discrimination”* (right)

Imagine that Factors X, Y are the influencing factors for success in lending repayment, or a commercial loan.
In some cases, Address has correlations with values of X, Y and Z
Z is a sensitive parameter that is unethical to discriminate with (height, gender, race).

What redlining is doing is wrongly assuming that the address has a causality on X and Y, and hence it is a predictor of success. It is a heuristic, reducing the complexity of analysing both X and Y.

This has the effect of deciding against certain values of Z if they are correlated with the address as well (e.g. a neighbourhood being populated by certain ethnic group), even if groups of Z have very good values of X and Y

So, back to plain English: Besides not using the actual predictors for success in decision making, this decision system is destroying any possibility of development of said neighbourhood, even by the people that would have high chances of success. This means no access to commercial lending, low chances for job creation, lower investment in infrastructures, effectively dooming the area to a spiral of poverty.

It could be understandable that at the time, available information was scarce, and computational methods were inexistent — hence the use of heuristics to facilitate the decision process.

But in the age of open and accessible data, cheap computational methods, and machine learning that should not be a problem. Or is it?

Fast forward to today…

Let’s say that today a company is aiming to automate a decision process through machine learning.

A large dataset is available to train a model, and the first results are promising. Training methods were correctly applied to prevent general problems (e.g. overfitting) and it was found that the model performs reasonably well according to precision and recall scores. The model gets deployed into production. But one problem was overlooked — the annotation of the data.

Data is annotated by humans, and humans are well known for using the heuristics we discussed just before (like the address) to make decisions.

If the address is included in the training set, the model might use it as a predictor of success, just like the example before. The trained model can go as far finding correlations with the sensitive variable Z (e.g. race, gender), using it as major feature for classification.

Is it really that common to have biased datasets?

Yes. Here are some examples:

Imagine that you are working in a government project that aims to predict crime and fines based on historical data — Walking while black

Imagine that you are working on an automatic system to accept / reject Airbnb requests based on past data — Airbnb racism claim: African-Americans ‘less likely to get rooms’

Imagine that you are working on a dating matching algorithm for OKCupid — Race and Attraction, 2009–2014

Can we remove these sensitive parameters?

No. removing sensitive variables does not fix the problem. There can be variables that are correlated with the sensitive parameters (imagine removing race as a parameter, but having correlated parameters such as skin colour). Removing the parameters from the data set will only make you blind to the problem.

Can I do something?

Yes!

1 — Know your data!
Who made these annotations? What is the cultural environment of your annotations? When were they made? 20 years ago? 10 years ago? In which country were the annotations made?

2 — Preflight checks
It’s a good idea to have a list of variables that you don’t want to be interfering in your decision making. Gender, race? Run some quick data checks to see, for example, if 100% of the samples in sensitive groups have the same label.

3 — Consider fitness metrics that account for sensitive groups
Evaluate your models outcome not only based on precision-recall, but add metrics on unwanted discrimination. Does your model reject 70% of women while accepting only men, or vice versa? At the end it’s you who decides if a model is adequate for deployment.

4 — Consider splitting population groups and create different classification systems for each group
Splitting your decision model into different decision models for each sensitive parameter can reduce discrimination, and even increase your precision-recall scores. This is the core idea behind random forests, and it helps greatly with imbalanced data sets and minority groups in your dataset, assuming your labels are unbiased.

5 — Consider manual adjustment techniques that force independence of of variables
In case you are sure that the data is fully biased, and if there has been decided that a certain group should not be discriminated against, methods like Naive Bayes, decision trees, and decision rules, allow you to tune support parameters for certain variables[3]. This is controversial practice as it implies that you are effectively tampering the data to ensure balance in the decision.

6— Remove minorities and process them manually.
We need to acknowledge that machine learning is not perfect, and that it’s ok not to have 100% automation every time. Funnel cases that you know will be hard to evaluate by machine learning methods to human decision making. But make sure those humans are aware of their own biases.

Fighting the legal fear

Financial institutions have been afraid of machine learning, and what we just explained could be a major factor — ignoring theses issues will lead to legal risk exposure.

There are an increasing number of anti-discrimination laws [4] that prevent institutions of discriminating individuals based on certain protected attributes. And if we look at historical practices (such as neural networks), and some new techniques (such as deep learning), we see that the decision process with these methods is not transparent. Finding latent biases in these black-box methods is likely to happen when someone files a claim or lawsuit, too late to avoid fines and sanctions.

Having these subjects into consideration when developing automatic decision making will likely benefit the community in general, but also help developers and decision makers to propose and introduce machine learning methods in traditional and highly regulated industries such as health, finance, and insurance industries.

References:

1 — Elon Musk: regulate AI to combat ‘existential threat’ before it’s too late

2 — The surprising pattern behind color names around the world

3 — Calders 2010 — Three naive Bayes approaches for discrimination-free classification, Toon Calders · Sicco Verwer

4 — List of anti-discrimination acts