The consequences of dodgy data decisions

The Australian government just announced its intention to begin drug testing trials of certain welfare recipients, as part of its 2017 federal budget.

Whether you agree with the policy or not, how welfare recipients will be identified for drug testing has implications for us all.

The 2017 Australian federal government budget proposes a data-driven approach to deciding who gets drug tested:

“Job seekers will be selected for the trial on a random basis, based on a data-driven profiling tool developed for the trial to identify relevant characteristics that indicate a higher risk of substance abuse issues.”

There is a crucial word in that sentence that shouldn’t be there. Can you guess which one it is?

Blood Test Kit by Lee Haywood (CC-BY-SA)

A world without randomness

Increasingly, algorithms fueled by a range of data sources are being used to make decisions about us — the services we can access, the products we’re likely to be interested in, the kinds of treatment we should receive.

But we don’t have robust ways of judging the fairness of the data and algorithms that underpin these decisions. We have laws that govern how organisations access and use personal data. Beyond that, though, how data is used for decision making is poorly understood and largely unregulated.

There’s no broader governance of data for decision making. No ethics frameworks, principles or rules. No oversight. No accountability.

If a person feels they have been unfairly profiled by an algorithm, or that the data sources it uses contain inaccuracies or bias, there’s limited meaningful mechanisms for appeal.

Automated processes don’t and won’t just affect welfare recipients. They’ll increasingly shape many of our interactions with government and businesses, in good and bad ways. Feeling like these processes and decisions are unfair and outside of our control leads to a breakdown of trust in the institutions we engage with, and a suspicion of data uses that might actually be intended to help us.

In other words, the more a government is seen to use data for punitive or unfair purposes, the less likely people are to have trust in and support government using data for potentially beneficial purposes — like individual digital health records (also proposed in the same federal government budget). These things are connected.

Demonstrating strong governance of data-driven decision making is essential to building trust. In what ways do we expect organisations to be accountable for errors, breaches, bias, limitations and inconsistencies in data models, particularly where they have the potential to shape peoples’ lives or society? What ethics frameworks and oversight do we need?

Data is rarely perfect and neither are we

Take the proposed data-driven approach to drug testing of welfare recipients, for example. It’s understood the government will use data sources like wastewater analysis and statistical surveys to help identify locations and participants in its trial.

Recognising the limitations of data sources is essential. Wastewater (i.e. sewage) testing is increasingly being used as a tool to measure and interpret drug use in national populations. But while wastewater analysis can detect trace amounts of substances like methylamphetamines, cocaine, tobacco and alcohol in a city’s sewage system, it can’t tell you anything about the people the sewage comes from.

It doesn’t reveal whether different kinds of drug users are young or old, poor or rich, employed or unemployed. We might lead different lives but our poo stinks just the same.

Wastewater analysis might be used to identify drug trial locations, but can’t be used to infer characteristics about the people who might be taking drugs — for example, that they are young or that they’re looking for a job. Understanding the limitations of data sources is part of responsible data model design.

Algorithms, data models, and the data sources they rely on can exhibit bias. Limiting the Australian government drug testing trial to Newstart Allowance and Youth Allowance recipients reflects an assumption that substance abusers are likely to be young and/or unemployed.

This isn’t always the case. The last National Drug Strategy Household Survey in Australia (conducted in 2013) found that rates of illicit use of drugs have risen most significantly among people over 50. Cannabis use increased significantly from 8.8% to 11.1% among people aged 50–59. Across most drug categories, the average age of drug users is trending older.

How should we expect the facilitators of a drug testing trial to adjust for bias, if its intention is to sample welfare recipients randomly? Bias in the data sources and models we use reinforces biases we exhibit in society.

There are so many ways in which better use of data can and will improve our lives. Personalised healthcare. Tailored learning. Cheaper and more efficient transport, energy, administrative experiences. But it’s going to be harder to make these opportunities a reality if there’s a deficit of trust in how organisations — particularly but not limited to government — use data.

Any data-driven decision, service or proposal with the potential to shape Australians’ lives, or the society we live in, should have ethical, accountable processes at its core.