AI and GDPR: Top recommendations for small organizations

Published in

MyData Journal

9 min readMay 29, 2018

AI systems are evolving at fast pace, thanks to the advance in algorithms and to the large amount of data currently available to train them. It is only few weeks old the presentation and demonstration of the advanced capabilities of the Google Duplex, a new generation of Virtual Assistants able to carry out “natural” conversations with other humans, on our behalf.

While this future is exciting, however building consumer trust for such systems is still a challenge that might slow down the pace at which their adoption will happen, especially after May 25th.

Don’t leave trust outside of your business

It is indeed intention of the General Data Protection Regulation (GDPR) to create and re-engender this trust in AI with a number of requirements that will force service providers to comply with, or to further advance them by creating new innovative solutions. While compliance is required to businesses of any given size, Small and Medium Enterprises (SMEs) certainly need more guidance to understand the problem and identify solutions.

I highlight here what could be the most important recommendation for SMEs to do AI right after May 25th.

Summary of recommendations (for the quick readers):

Consider why you are using AI;
Know your data;
Know your users;
Make it clear;
Consider your options;
Listen to your users;
Know your system;
Be ready to changes;

If you want to understand better, and you are a careful reader, keep reading and see you at the bottom of this article.

First of all we need to understand the problem.

Are you using AI under GDPR terms?

YES, when you are using automated profiling. But what Automated Profiling is? Under GDPR the use of an individual data, being processed through automated means, e.g., a computer software. Profiling that aims to infer habits and traits not explicitly stated otherwise, constitutes automated profiling. Consequence of such profiling is most likely using such inferred traits to take decisions about a specific individual, including the provisioning of a personalised service. Example of these services are marketing campaigns with target advertisement or assignment of a credit score.

If you are only collecting an individual Personal Identifiable Information (e.g. name, DoB, address, etc or user preferences) and use them to assign customers to different demographic categories, this does not constitute automated profiling.

Is that enough?

To constitute use of AI regulated under GDPR terms, automated profiling needs to be performed in relation to automated decision-making. A (solely) automated decision-making process involves the use of profiling information in order to take decisions about individuals. With such decisions performed by machines and their algorithms, without any human intervention. If for instance, your AI powers a website that automatically grants or not a loan to any applying customer, based only on the created customer profile, then YES, you are relying on automate decision making.

Is there something more?

YES. GDPR requirements for the use of AI in profiling and automated decision become more stringent if the effect of the automated decision-making process might have legal or significant effect on the individual, subject of such decision.

If your service is using automated profiling to support sole automate decision-making that could have positive or negative effect on an individual, you should carefully consider the following in your journey to GDPR compliance.

Consider why you are using AI

If the use of AI in relation to the above profiling is not required by law, i.e., you are not providing a financial service for which profiling can help to identify fraud, or it is not possible to demonstrate that profiling is strictly necessary for the performance of a contract or entering into it and that a less-intrusive method to provide the same service is possible, then individual consent should be collected.

Recommendation: your request for consent (GDPR Article 6) should be transparent, clearly communicated and specific to the scope. You should avoid to collect data and later decide on how to use them; you should continuously review how you use your customers data and build an infrastructure to request consent at scale as new data are collected and used for profiling.

Know your data

Once you know your service is going to use profiling and automated decision, you will have to understand what data you will need from your customers in order to create such profile. Sensitive data (e.g., religion, sexual and political orientation etc) will always need consent to be processed. Moreover GDPR requires to store and process data of european citizen within the EU. This might affect the costs incurred by those small and medium businesses that rely on external storage and computation power provided by large cloud providers. Such providers might charge you more to geographically limit use of their resource.

Recommendation: you should avoid profiling users based on their sensitive information whether these are directly collected from them or inferred as part of their undergoing automated profiling. Data minimization principle should drive your service design as you should be able to understand the minimum amount of data you will need for it. The best way for doing that is to consider Data Protection by Design and by Default, building services by always and from the very beginning reviewing what data are strictly needed, how to use them and why. Do not experiment with algorithms and training models by collecting first data and then decide how to use them, but rather only use well tested models that you know will suffice to your scope, before deploying them.

Know your users

Consent becomes the only legal basis for collecting children (e.g., youth under the age of 16 in GDPR terms) data. If your service expect to involve children and to profile them for the purpose of automated-decision (this might be pretty common in case of target advertisement on mobile gaming platforms), then it becomes of paramount importance to prove that the best effort has been done in order to verify the age of the users.

Recommendation: your customers are likely to lie in similar situation. GDPR Accountability principle (Article 5 and 24) requires you to follow best practice and documenting how age verification has been performed. You should consider to involve verification from a trusted party (e.g. a parent) before registering new users, in particular if automated profiling of children is luckily to happen. However, remember that asking the user itself to indicated such trustor or use only one might not be the best choice. Otherwise you could require customers to sign in through third party verified identities services which are expected to grow in pretty number soon (e.g., ABnB, PayPal, etc).

If you get here and you are now ready to roll out a service that performs automated profiling and decision making, few more aspects deserve some attention.

Make it clear

Your customers have the Right to be informed and receive explanation on how their data are used by artificial intelligence and machine learning driven profiling and decision making processes, what are the benefits and what could be the risks. For large as well as small businesses, providing such explanation might raise some concerns for two reasons: the risk to reveal Intellectual Property (mostly associated to used models) and the actual difficulty to simplify explanation of otherwise complex processes.

Recommendation: You are not required to disclose internal functioning of your algorithms but rather to be clear on which categories of data you are using and how they affect your algorithm profiling or decision job. Focus on the link between inputs and outputs, by providing examples that people might understand. Don’t rely only on engineers in your team, but work also with UX experts to understand the best way to convene technical information to non technical audience. Use layered privacy policies, complementing these examples with more detailed technical information.

Consider your options

Individuals have the rights to object to automated processing (Article 22), while they should be still granted with the access to the service, unless complexity to provide it is not overtaking this opportunity. For instance personalized advertisement won’t be possible without profiling, while credit score assignment could be still performed manually.

Recommendation: Unless you are not ready to lose your customers, because without profiling you won’t be otherwise able to provided any service, you should design your service in order to be semi-automated and to have a human always involved in the process. However the role of this human intervention should be clear and well-specified from the beginning and cannot be left undefined until the need occurs. This should encourage you to design your service always having the end-user and her needs in mind.

Listen to your users

With respect to AI and automated profiling and decision making, users don’t only have the right to object but also to challenges the algorithm decisions and request for rectification, in particular if they perceive biases in the decisions. This requires to have in place processes to deal with such requests and to support users with human assisted decisions. While this can be challenging for a large organizations due to the scale of such requests, however it might be even more daunting for small ones, due to the complexity of the process and the structure to deal with such requests, even if sporadic.

Recommendation: Keep your algorithms always up-to-date and reflecting well the population of users that are subject of your profiling activities. Growing and continuously testing training sets as your users population grow and change can allow you to avoid the number of such challenging requests. Provide examples and tools allowing your users to test on how similar users have been treated by your system before they can start a complaint. If a challenge is raised, be sure your process is ready to respond to it. Don’t stop learning from received challenges and update algorithms and their models accordingly by evolving them in order to reflect how human intervention has operated in similar situations.

Know your system

In compliance with GDPR Article 22(3) on safeguards, organizations using AI should be able to maintain and provide the evidence on how their algorithm have undergone a continuous reviewing process to track, and mitigate any possible unwanted malicious diversion from their initially intended purpose. It is clear how this can be a difficult task for small and medium organization, especially those building their solutions not on proprietary framework but expanding existing open source ones (e.g. deep learning tensorflow etc). This being due, as explained above, to the sometime self-evolving nature of these algorithms and models and the fact that some ‘malicious’ algorithms behavior might be inherited by previous implementations.

Recommendation: This perhaps is the most challenging GDPR requirements in an era where AI algorithms are becoming a commodity. As service provider, try to understand as much as possible of the underlying models and test algorithms before reusing them; compare different versions of the same one and vet algorithms before any reuse, document their source and their previous developers, and maintain adequate accountability of this information. In the future all these information could form the features of a standardized algorithm’s passport. This from the one hand will allow to track algorithms provenance as well as involved liability in case of algorithms’ misbehavior, while on the other side it will increase responsibility of community to better verify algorithms and engender trust in their reuse and evolution.

Be ready to changes

Even if not directly related to AI and profiling, there are other additional digital rights GDPR is empowering end-users and that can have effects the use of AI and Machine Learning in your services. Users requesting their data erasure Right (or Right to be forgotten, Article 17) might break the models your AI and Machine Learning algorithms are using. If requested, and under specific circumstance (e.g., data obtained using consent or legitimate interest) any of the data that are linked to a specific user should be removed by your system and as such, also from any model or algorithm where they are being processed. Challenge will be to trace back at which point and in which way data were used, in particular for Deep Learning models, and to act accordingly considering how third party models can be affected by such requests. This risk might be even higher for small and medium organiztions relying on data from a limited number of users to build their models and solutions.

Recommendation: To better mitigate the unexpected effect of this GDPR compliance aspect, organizations developing AI solutions should start developing best practices by documenting data used for their models, including proper provenance of data, and annotating them with their features, in order to make easy to find alternatives data to retrain the models. SMEs should rely as much as possible on anonymised existing data sets for which consent to use them for building model has been previously obtained.

Hope this articles provided you enough guidance on how to do AI right under GDPR. Please share your comments if you want to discuss this more and join us at Mydata 2018, August 29–31 in Helsinki.