Aurelie Pols
Jul 22, 2017 · 3 min read

Interesting read, thanks Alistair Croll!
Related to your last paragraph, that’s where accountability comes in: who decides how these algorithms are going to work today and in the future as well as potential consequences of the use of those algorithms.
As these non-ethical issues bubble up with usage, it’s extremely difficult to foresee what might “harm” individuals down the road.

What I’m currently seeing is that in light of the GDPR, certain data science endeavors stay away from “sensitive data” yet recognize there might be proxies. Typically, someone buying halal food is a potential proxy for religious beliefs.
On top of that, the GDPR requires to undergo Data Protection Impact Assessments (DPIA) as stipulated in article 35 “Where a type of processing in particular using new technologies, and taking into account the nature, scope, context and purposes of the processing, is likely to result in a high risk to the rights and freedoms of natural persons”.
Problem is — and that’s what I’m currently seeing with my clients — 1. when is high risk? 2. how do you define that? & 3. how do you make sure this includes some form of feedback loop to possibly revisit the algorithm and make an accountable, conscientious choice about whether it should be tweaked or not.

In How can we stop algorithms telling lies?, Cathy O’Neill starts classifying “bad algorithms” :
1. Gone bad because of neglect
2. Gone bad because of lack of verification on a wide variety of test cases before releasing the code
3. Bad because unethical yet compliant
4. Unethical and illegal
Not saying the classification is perfect yet it’s a start on which to build upon imho.
Negligence will hopefully, partially be addressed through increased accountability brought about by the GDPR (I’m a naive dreamer, I know! and a EU fan)

The second example could be addressed by transparent thresholds of acceptability, think percentage of false positives for eg. and some form of classification/rating mechanisms if/when algorithms are being re-used.

The third and last example is a matter of building cases, fishing out how these algorithms work. Like Jonathan Mayer did with the Verizon zombie cookies and before him Chris Hoofnagle with the Flash cookies, Max Schrems with SafeHarbor, Paul-Olivier Dehaye today with Cambridge Analitica. On top of that, whistleblowing legislation is also evolving so companies will need to be extra careful moving forward with any data science ventures.
Obviously, between points 3 and 4, it’s the authorities that’ll have to up their game: current data protection authorities (DPA) in Europe, soon to become supervisory authorities (SA) under the GDPR and possibly (still) the likes of the FTC, on top of state related Attorney Generals like Eric Schneiderman in NY or formerly Kamala Harris who did some excellent work!
And example of such lack of understanding and enforcement is the ICO’s issue with Google’s DeepMind for the NHS where data initiatives deemed category 3 can be relabeled 4.
Risk, as in 4% of global turnover or 20 million euros whichever is higher sits exactly there under the GDPR. Better have those documentation in place in case an SA knocks on the door because you’re addressing EU citizens and are supposed to respect their rights!

    Aurelie Pols

    Written by

    Digital Analytics, Data Governance & Privacy rule. Professor @IE Business School, Think about Data Ethics for EDPS & pan-EU GDPR consultancy — Thoughts my own