GDPR and its impacts on machine learning applications

Published in

TrustableAI

6 min readNov 7, 2017

This post also appears on our Trustable AI blog.

The General Data Protection Regulation (GDPR) was adopted by European Parliament in April 2016, and will be enforceable throughout EU by May 2018. Many regulations regarding algorithmic decision-making are added to this new set of law, compared to the previous Data Protection Directive (DPD) which is expected to be superseded. In what follows we give an overview on the implicated technical challenges in algorithmic fairness and explainable AI, following the outlines given in Goodman et al (2016)[1].

Firstly we would like to say that we are not experts on EU law system. The opinions laid out in this article represents a summary of the research papers we have read on the subject matter, and we do our best to present them accurately. We believe that a mutual beneficial relationship built on trust between human and algorithms is not purely a technical problem. Therefore we are also concerned about the legal and social aspects of algorithms. By this article we hope to bring out more discussions on these very important issues.

Highlights in GDPR

According to Goodman et al, much of the regulations in the GDPR “clearly aimed at perceived gaps and inconsistencies in the EU’s current approach to data protection.” This includes, for example, a clear specification of the right to be forgotten, and regulations on collecting data from EU citizens by foreign companies.

There are three major differences between the GDPR and the previous DPD:

GDPR is a Regulation, while DPD is a Directive. A Directive is a set of general rules which is only enforceable when each EU country transfer them into their national law. On the other hand a Regulation is similar to national laws, the only difference being it covers the whole region of EU. Therefore the GDPR will be enforceable in May 2018 without any additional legislative process.
GDPR explicitly states that companies violating the regulation is subject to a penalty up to 20 million euro or 4% of their global revenue, whichever is higher. (Article 83, Paragraph 5)
GDPR is not limited to companies with their headquarters set in EU, but to all companies that are holding data from EU citizen. (Article 3, Paragraph 1)

For the rest of the article we will focus on Article 22 of the GDPR regarding automated individual decision making:

Non-discrimination

GDPR states in Article 22 Paragraph 4 that, decisions “which produces legal effects concerning him or her” or of similar importance shall not be based on the following categories of personal data specified in Article 9 Paragraph 1:

…personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation.

Under minimal interpretation, using the above categories of sensitive data directly in algorithms is prohibited. However as Goodman et al point out (and as we have been saying (Chinese) in our review of algorithmic fairness research), discrimination cannot be eliminated solely by excluding sensitive data in decision-making process, due to redundant encoding, redlining, and other problems.

Under maximal interpretation, it is possible that all data having some relations with the above categories of data have to be excluded. Goodman et al argue that this would introduce some technical challenges, and it could still not be enough to eliminate all biases from the algorithm, because:

It could be difficult to completely remove the influences of sensitive data from an algorithm and still keep it useful.
Uncertainty bias, where the algorithm has different levels of confidence to its predictions about different groups of people, due to one or some of the groups being underrepresented in the data. (This, however, might be addressed by the “equal opportunity” fairness proposed by Hardt et al[2].)

Please refer to our previous article “Are algorithms fair?” (Chinese) for the state of current algorithmic fairness research.

Due to these difficulties, Goodman et al argue that it could be hard to find an interpretation of the Regulation that suits all kinds of situation. Therefore it is likely that one will have to examine the actual usage and details of the algorithm in question in each of the cases, and the explainability of algorithms will be important.

Right to explanation

It has been said that GDPR has introduced the “right to explanation”, but what exactly is being granted in the Regulation might still be in debate.

GDPR Article 22 Paragraph 3 states that a data controller “shall implement suitable measures to safeguard…at least the right to obtain human intervention on the part of the controller, to express his or her point of view and to contest the decision”, otherwise a person has “the right not to be subject to a decision based solely on automated processing” (Paragraph 1).

As Wachter et al (2017)[3] point out, a right to explanation is not mentioned in the above text. Even with Recital 71, where an explanation is explicitly mentioned, it is dubious whether the right is legally binding. Wachter et al believe that in summary, what GDPR has introduced is a “right to be informed”, and this is not equivalent to the right to ask for explanations about any decisions made by algorithms.

On the other hand, Goodman et al believe that Article 13 to 15 give a person the right to access the data that’s been collected, and the right to know the purpose of collecting it, which includes the right to receive “meaningful information about the logic (algorithm) and possible impact.” For example, Article 13 Paragraph 2 (f) states that data controllers must inform the user about the followings before collecting data:

Full text of GDPR Article 13 Paragraph 2 (h)

Therefore it might be worthwhile to ask to what extend can one ask for an explanation about an algorithm. Goodman et al cite Burrell（2016）[4] for the following barriers to understanding an algorithm:

Intentional concealment on the part of corporations or other institutions, where decision making procedures are kept from public scrutiny.
Gaps in technical literacy which mean that, for most people, simply having access to underlying code is insufficient.
A “mismatch between the mathematical optimization in high-dimensionality characteristic of machine learning and the demands of human-scale reasoning and styles of interpretation”

Recently, research in explainable machine learning and explainable AI has made substantial progress. Projects such as Explainable Artificial Intelligence of DARPA, and the 2016 ICML Workshop on Human Interpretability in Machine Learning, and IJCAI-17 Workshop on Explainable AI (XAI) Proceedings have a lot of information, which we will cover in future articles.

When GDPR is enforced

Apart from the regulations in GDPR which will be enforceable soon, there are discussions about further protections to human rights with regard to advancements in machine learning applications. Thelisson et al (2016)[5] draw a comparison between regulations on algorithms and EU regulations regarding food safety, and point out several possible measures for further protections:

Code of conduct
Quality label
Data chain transparency
Discrimination-aware machine learning research

These are directions worth following. Furthermore, as Wachter et al have pointed out in the previous section, the protective measures granted by the Regulation might not be as effective as expected. Therefore we suspect further amendments to GDPR might also be in progress.

Goodman, B., & Flaxman, S. (2016). European Union regulations on algorithmic decision-making and a “right to explanation,” 1–9. Retrieved from http://arxiv.org/abs/1606.08813
M. Hardt, E. Price, and N. and Srebro, “Equality of Opportunity in Supervised Learning,” in Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, Eds. Barcelona, Spain: Curran Associates, Inc., 2016, pp. 3315–3323.
Wachter, S., Mittelstadt, B., & Floridi, L. (2017). Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation. International Data Privacy Law, 7(2), 76–99. Retrieved from https://academic.oup.com/idpl/article/3860948
J. Burrell, “How the machine ‘thinks’: Understanding opacity in machine learning algorithms,” Big Data & Society, vol. 3, no. 1, p. 2053951715622512, Jan. 2016.
Thelisson, E., Padh EPFL, K., & Elisa Celis EPFL, L. (2016). Regulatory Mechanisms and Algorithms towards Trust in AI/ML. Retrieved from http://home.earthlink.net/~dwaha/research/meetings/ijcai17-xai/9. (Thelisson, Padh, & Celis XAI-17) Regulatory Mechanisms and Algorithms towards Trust in AIML.pdf

GDPR and its impacts on machine learning applications

Highlights in GDPR

Non-discrimination

Right to explanation

When GDPR is enforced

Written by Pomin Wu