Data Bias and Low Expectations
A couple of weeks ago a story broke (and was subsequently forgotten in the maelstrom that is Brexit) that a healthcare app gave two people with exactly the same symptoms and histories entirely different advice.
The British Heart Foundation tested NHS-backed healthcare platform Babylon using two fictitious 60-year-old smokers with a history of panic attacks and sudden onset chest pain and nausea. One was advised that they might be having a heart attack and to seek immediate medical assistance, the other that they were probably having a panic attack.
The only difference? One was a man, the other a woman.
Now. I should preface this by saying that this is not a Babylon-bashing piece (not ruling out writing one at some point). This is a quiet chat about how we all need to be a bit more demanding of our technology and the people in charge of it.
So. What happened?
Basically the AI ‘brain’ of the app did what all AI does — it used the data it had available to make the best possible choice based on the probabilities and statistics of the symptoms that BHF gave it.
And there is nothing wrong with that. It did exactly what it was supposed to do, and what it was designed to do. A+ to the Babylon developers who made it.
Where it all went wrong was way before we even get to the app. In the collection of the data that the decisions were based on.
Women have been getting the rough end of the deal for generations when it comes to diagnostics and healthcare , especially cardiac care. They are not only less likely to be diagnosed correctly, but are also less likely to receive correct treatment and are more likely to die as a result of it. *
The reasons for this come down to the historic minimization of women’s pain and well-being and fact that many illnesses and medical conditions can present different symptoms in men than in women — but not always.
I accept that women are reported to have a lower risk of heart attacks than men, but I think there’s gap between the records and the reality. Even if women are genuinely less at risk, an AI ‘Symptom Checker’ should, I believe, err on the side of over-diagnosis rather than under-diagnosis. Basically, I would rather be told I might be having a heart attack and then be told by a doctor that I wasn’t, than find out too late that I was having a heart attack that wasn’t diagnosed because it was statistically unlikely.
There have been reams of paper and a gazillion pixels written on this kind of institutional sexism and that isn’t what I’m writing about here. Safe to say that it exists, it has done for generations and it is getting more and more dangerous as we start basing technology on biased data.
Simply put, we’re basing AI on faulty data, and then we’re surprised that it produces results like this.
I guess that if AI is meant to imitate human decision-making then diagnosing women as being hysterical rather than in need of critical medical attention is a sign that it’s working perfectly. It even did it faster. And from the comfort of your sofa.
Babylon actually wrote an article addressing the accusations of gender bias in its’ system and, kudos to them, they spoke about the historic gender bias in medical care and medical data collection, said they are working hard to address both and touted their proportion of female employees.** Pat on the back. Good for you.
My issue with it? If they can address it now, they should have addressed it before.
Anyone who works in medicine, tech or data or any combination of them knows that AI and automated systems are only as good as the data you base them on, and that gender bias in data is a huge problem. There is absolutely no way that Babylon were unable to interrogate their data and collection methods before going live with this system. Which means that they were lazy.
What can we do?
I’m going to say this loudly and often about what people can do about anything that doesn’t work for them or doesn’t work full stop: Get. Loud.
Make a fuss in any way you want, be that through social media, complaints to the regulator or feedback on the tool itself. Make it clear to the people who control the systems we use that laziness is not acceptable.
We should be insisting on transparency from people like Babylon, and accessible transparency at that.
Accessible transparency means not hiding things in legalese, small print or technical and specialist language. It means setting out the boundaries, aims and (if needs be) flaws in your system in a way that anyone could read and understand, and having an easy way for people to provide feedback.
To me, transparency also means knowing who is behind and benefiting from these businesses and my data. I want to know if someone is using my data to get rich, and who they are.
Data can be a minefield, and too often it is easier and less stressful to just accept that our data is being collected, used and profited off by people who understand it better. And I entirely sympathise and accept that a lot of people have more important things in their lives than privacy settings and data management.
But when the use of data collected from us starts to mean that some of us are in real danger, it suddenly gets a lot more frightening and a lot more urgent.
We need to make a fuss and we need to make it now — this is (relatively speaking) only the start of automated AI systems being an integral part of our lives and national infrastructure. If you think that a system is unfair or problematic then you have a duty to make a fuss and hold people to account.
There may be a simple and perfectly reasonable explanation, and if there is; great. Make them put it on the website.
I hope I’ve managed to make the case for making a fuss, and that if you see something hinky that you will question it, tweet it and be part of a global effort to make sure that we are not helplessly at the mercy of data bias by lazy companies who should, and do, know better.
*References for gender bias in cardiac/critical care:
- Alspach, J. (2012). Is There Gender Bias in Critical Care? Critical Care Nurse, 32(6), pp.8–14.
- Alabas, O., Gale, C., Hall, M., Rutherford, M., Szummer, K., Lawesson, S., Alfredsson, J., Lindahl, B. and Jernberg, T. (2017). Sex Differences in Treatments, Relative Survival, and Excess Mortality Following Acute Myocardial Infarction: National Cohort Study Using the SWEDEHEART Registry. Journal of the American Heart Association, 6(12).
- * They also said that the tool is a ‘Symptom Checker’ and not a diagnosis tool. I have trouble seeing the difference, especially when ‘GP’ is in the name, and I think they knew that people would use it and view it as a diagnosis tool. I suspect that this is more a way to cover themselves legally in case of misdiagnosis (as in their article) than a true distinction.
*** Not to get too much into politics, but I find it concerning that Dominic Cummings, Downing Street darling, was a consultant for Babylon until July 2018 and in August 2018 the company received significant government funding and backing from the then Health Secretary.