The EU’s regulated limitations for AI

Published in

Sogeti Data | Netherlands

9 min readAug 2, 2022

Not everyone might fully realize it yet, but Artificial Intelligence (AI) is starting to play a larger role in many aspects of our lives. And as we all know, AI needs data, preferably lots of it.

Due to big scandals over the years where personal data has been leaked and/or misused, and the –rightfully deserved- growing awareness about data privacy and ownership, society has been paying more attention to with whom their data is shared.

To help and protect European citizens in their data rights, the European Union introduced the General Data Protection Regulation (GDPR)[1] on May 25 2018. This regulation introduced and defined rules to protect the fundamental rights of (natural) people and their data in the EU. Obviously, this also heavily impacts the use and development of AI, especially when concerning sensitive or personally identifiable information. Moreover, the GDPR even makes some remarks about Automated Decision-Making, directly targeting AI solutions.

Many companies have high expectations of what AI can do for them, and its unrivalled performance in comparison to ‘traditional’ systems. However, this increased performance comes at a price: the models’ transparency. More complex `black-box` models (e.g. Deep Neural Networks (DNNs)) often outperform simpler `white-box` models (e.g. Decision tree).

This accuracy-interpretability trade-off of AI techniques is something that must be carefully considered [2]. Answers such as `Computer says: no` are rarely considered to be of any value. It can even be dangerous with regard to under-represented groups, as AI can favor certain outcomes based on basic characteristics such as race, sex, age, or any other factor.

Broadly speaking the GDPR is based on 7 principles[3]:

Lawfulness, fairness and transparency — Processing must be lawful, fair, and transparent to the data subject.
Purpose limitation — You must process data for the legitimate purposes specified explicitly to the data subject when you collected it.
Data minimization — You should collect and process only as much data as absolutely necessary for the purposes specified.
Accuracy — You must keep personal data accurate and up to date.
Storage limitation — You may only store personally identifying data for as long as necessary for the specified purpose.
Integrity and confidentiality — Processing must be done in such a way as to ensure appropriate security, integrity, and confidentiality (e.g. by using encryption).
Accountability — The data controller is responsible for being able to demonstrate GDPR compliance with all these principles

It is not difficult to see that all these principles have a significant impact on the way AI models can be developed and used. It can be limiting to AI development especially in relation to the aforementioned usage of high-performing black-box models — how do we make them adhere to the first principle of being lawful, fair and transparent to the data subject?

Automated Decision-Making under the GDPR

The fact that automated decision-making should be lawful, fair and transparent is often referred to as the `right to explanation`. Unlike other outlined rights, such as the also relatively well known `right to erasure` (also known as `right to be forgotten`), the right to explanation is not outright defined in the GDPR, however article 134 through 15 of chapter 3, which outlines the rights of the data subject, all state:

“the existence of automated decision-making, including profiling, referred to in Article 22(1) and (4) and, at least in those cases, meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject.”

This is what is usually referred to by the Right to Explanation; the right to obtain an explanation on any automated decision-making and applies to all automated decision-making processes, that are allowed under Article 22 of the GDPR[5]:

“The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.”

Explaining AI models

Early AI systems were easily understandable, but over time research has created more complex, better performing models. Deep learning models have taken advantage of the rise in scalable computation power, advanced algorithms, and large data sets. Every so often a new model is released, dwarfing the previous largest model. In June 2022 Google published regarding one of their latest models, a 1.6 trillion (10⁹) parameters language model[6]. As the size and complexity of deep learning models rise, it becomes more challenging to explain and interpret the outcomes of these models.

Growth of model sizes over time; note the logarithmic scale[7]

Interpretable vs Explainable

Interpretable Machine Learning models consists of most of the traditional `white-box` models and are inherently interpretable with regard to how they come to the outcomes or decisions they produce. Thus, we can understand the reasoning behind the decisions of the model.

The easiest example can be given with one of the most well-known models, a decision tree, which repeatedly splits the dataset based on an optimum criterion (i.e. gini, entropy) dividing a feature. By following the path from the root node down to a leaf node, we obtain a list of decisions made to end at the resulting conclusion.

When it comes to ‘black-box’ models, eXplainable AI (XAI) is a set of techniques that can be used to create a more interpretable layer to understand the outcomes of these models.

Illustration displaying the XAI concept by DARPA [8]

In XAI there are two approaches that mostly dominate the field: Shapely Additive exPlanations (SHAP)[9] and Local Interpretable Model-Agnostic Explanation (LIME)[10].

Both approaches are usually targeted to obtain a local explanation, meaning an explanation for a specific outcome. However, SHAP is also capable of creating a global explanation, creating an aggregate model taking all possible outcomes into account, and therefore explaining the entire model.

On a high-level these techniques imply assumptions to simplify a model (e.g. independence or linearity) and evaluate the difference in outcome as a result of perturbations on the input data.

But how do these techniques relate to an explanation? Exactly to what extent an explanation is required by the GDPR is unclear, but most agree that neither an interpretable model or feature importance are interpretable to most of the data subjects defined by the GDPR.

What qualifies as an explanation?

There is no clear definition of what qualifies as an explanation under the GDPR, and to whom it should be addressed. Ideally, it should provide human-like explanations, but in order to define what we as humans see as an explanation, a sociological approach is required.

In essence an explanation can be defined as a conversation with the goal of knowledge transfer in order to close or reduce a knowledge gap between the questionee and the questioner.

Imagine someone asks the question: Why does it snow?

The answer varies both on the one who asks the question, as well as the one who answers.

If a 5-year-old asks his dad, the answer will be along the lines of

“if it is raining and it is cold, the rain freezes and becomes snow.”

If an adult would ask a knowledgeable biologist, the answer can be far more detailed and complex:

“Snow is most common in the high-altitude and high-latitude areas of the world, so the closer you are to the poles or the top of a mountain the more likely you are to find snow. When temperatures reach or dip below freezing with minimal moisture in the air, snow crystals begin to form in the atmosphere and grow into snowflakes. If temperatures are warm enough on the ground, around 41° Fahrenheit (5° Celsius), the snow will land and begin to accumulate, but if it’s any warmer it will simply melt. It can never be too cold to snow, but heavy snowfalls are more common when the air on the ground is relatively warm.”

An answer can be followed-up by any related question trying to close the knowledge gap even further. For example, related to the aforementioned scenario, “But where does rain come from?”. Through such a conversation, the knowledge gap decreases incrementally. Although generating clear explanations based on any two knowledge levels is clearly a difficult task, explanation interfaces should try to be as human-like as possible.

Impact of AI legislation (My Opinion)

Through the limitation of lawfulness, fairness and transparency in automated decision-making, productionizing an AI solution in a business environment is clearly hindered. But this is a good thing. We need the GDPR to protect us citizens from possible business value as a result of the amazing performance some models can offer, because mistakes will always be made. Many might argue that humans are (even more) susceptible to the same danger. In some cases AI has already shown to be able to be on par with, or even outperform humans in some cases[11]. But we need to where the negative implications were quite clear: Amazon’s AI recruiting tool showed bias against women[12] or even a case of wrongful imprisonment due to wrong face detection[13].

These days big corporations are aware of these risks and societal impact of their models, and try to take appropriate measures. Although this recent example is not directly related to automated-decision making, Google has shown two new text-to-image models Imagen[14] and Parti[15], but has chosen not to release the code or a public demo as with these complicated models “there is a risk that Imagen has encoded harmful stereotypes and representations”.

But not all corporations take their responsibility, for example, Clearview AI[16] is a facial recognition company that has been getting a lot of attention over the last years. Clearview has been scraping the internet to collect over 20 billion images to build up their biometrics database. As a reaction to this, regulators from countries such as Canada, UK, Germany have started procedures to condemn and halt the Clearview’s data hoarding, and privacy breaching, practices.

In this fast-paced world where AI has accomplished some startling super-human achievements, we do need to be protected of becoming exactly that what the GDPR describes us as: ; data subjects. Like is often the case in law-making, the boundaries of the rights are still very vague, and usually left for judicial interpretation. And this is for good reason, as this post has shown it is hard to encapsulate what an explanation should look like. In many business cases the explanation will be added as an afterthought, instead of as an integral part of the model.

But the limitations and regulations of AI will likely not end with the GDPR. On April 21 2021 EU has made a proposal introducing overlapping with the GDPR directly aimed at AI, the `Artificial Intelligence Act`[17]. In this act, the EU is attempting to harmonize the rules of development and use for all AI within the EU. Different from the GDPR, the AI Act takes a risk-based approach and defines sets of rules based on four levels of risk: unacceptable, high, limited, and minimal. For example, AI systems deemed an unacceptable risk, such as social scoring, are to be prohibited, while specific high-risk systems (e.g. medical, judicial, critical infrastructure) are allowed but need to be confirmed and validated before being marketed. According to the current timeline, somewhere mid 2023 the AI Act should be published and taken into effect.

[1] GDPR Archives — GDPR.eu

[2] Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI 1910.10045.pdf (arxiv.org)

[3] Art. 5 GDPR — Principles relating to processing of personal data — General Data Protection Regulation (GDPR) (gdpr-info.eu)

[4] Art. 13 GDPR — Information to be provided where personal data are collected from the data subject — General Data Protection Regulation (GDPR) (gdpr-info.eu)

[5] Art. 22 GDPR — Automated individual decision-making, including profiling — General Data Protection Regulation (GDPR) (gdpr-info.eu)

[6] Switch Transformers: Scaling to Trillion Parameter Models: 2101.03961 (arxiv.org)

[7] Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model — Microsoft Research

[8] https://www.darpa.mil/attachments/DARPA-BAA-16-53.pdf

[9] https://shap.readthedocs.io/en/latest/index.html

[10] https://www.kdd.org/kdd2016/papers/files/rfp0573-ribeiroA.pdf

[11] https://med.stanford.edu/news/all-news/2018/11/ai-outperformed-radiologists-in-screening-x-rays-for-certain-diseases.html

[12] https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G