Getting to an Ethics of Data

Paul Hunt
The Principled Algorithm
3 min readApr 15, 2017

Ok, the title is a bit misleading. Data in itself is not a problem — it is just stuff. But collect enough of it and link it together in meaningful ways, then potentially there is a problem of what organisations do with it … it isn’t data that kills people, it is always people who kill people (to be dramatic about it).

Not long ago I was enthusiastic about the possibilities of machine learning. And I still believe there is a lot of promise (in particular for how it can change medicine and healthcare). But now I am more cautious.

Kudos to Will Knight over at MIT Technology Review. He has put together a great selection of articles about AI and its implications. A recent article goes right to the heart of my concerns about the rise of the algorithms. The main point is that while the models produce some pretty impressive results, it is too hard to work out what is going on inside. And this has implications for the decisions that are made by the models. Who is accountable? Things will go wrong at some point and there needs to be a way to lay out the facts. We need to find the fault and be able to implement controls.

Thankfully there are people out there working on shining light on these black boxes. DARPA is sponsoring a program to try and ‘explain’ AI. It will be interesting to see what comes of it — assuming it becomes public knowledge.

Last year, ProPublica drew attention to “[a] computer program [that] spat out a score predicting the likelihood of … committing a future crime.” The examples they highlighted intended to point out the bias in the model. Interestingly, Abe Gong obtained the same dataset that ProPublica used and found that the model wasn’t statistically biased. So, perhaps the ProPublica story was skewed in a way to cherry-pick examples that suited their point — which I think is valid: there is bias baked into the crime prediction model. But the bias comes from the surrounding environment from where the data is collected (in this case probably due to the way the survey questions were phrased). You may have heard about the Microsoft Twitter bot — Tay? Not long after going live she started going off the rails and joined the ranks of online trolls! Why? Because the algorithm learnt from the way it saw other people commenting online, except that it didn’t have the ability to contextualise the data.

So what?

My view is that with great power comes great responsibility (sorry!). There now exist massive datasets that contain enough information for algorithms to predict all manner of sensitive things. You’ve heard about how Target knew a woman was pregnant and sent her targeted coupons — outing her activities to her father before she had a chance to explain the situation on her own terms?! But what about the Facebook experiment during the 2010 US election campaign?

There is power in being able to extract so much meaning from data. The volume of data only grows, the algorithms only get better, and the range of information people are willing to disclose (on the assumption of privacy or not) only increases.

Another example? The current investigation into numerous British MPs into abuses during the 2015 election. The campaign used data to target specific electorates — which is believed to have influenced the outcome of the election in favour of the Conservatives.

There are more examples! Cambridge Analytica anyone?

I am only at the beginning of this journey. But I think it is a fundamental requirement that we use these algorithms with our eyes wide open — and responsibly. And to assist with this process I think there is a place for a Data Ethicist to advise individuals and companies — and hopefully also governments — on ensuring there are robust checks and balances from the time data is collected through to decisions being made as a consequence of models created from that data.

A later article will explore a potential methodology for a Data Ethicist.

[Also published on LinkedIn]

--

--