Why GDPR Limits Data Processing
There are hardly any laws on EU level which severely limit the processing of data in the same way the EU General Data Protection Regulation (GDPR) does, and there are hardly any laws which create more conflicts between inhouse lawyers and the business.
The GDPR does not apply to all kinds of data, but only to information relating to an identified or identifiable natural person (the data subject), so-called personal data. The protection of personal data within the EU is a human right, which finds its sources in Art. 8 of the European Convention on Human Rights (ECHR), Art. 8 of the Charter of the Fundamental Rights of the European Union (CFR) and, for example in Germany, Art. 1 and 2 (1) of the German Constitution. The GDPR takes a very strict approach and states that, by default, the processing of personal data is forbidden, unless (!) there is a legal basis within the GDPR which allows for the concrete processing. Other laws like the California Consumer Privacy Act (CCPA) take another approach and basically link the lawfulness to process personal data with the companies’ obligation to inform the data subject, which obviously is easier to implement (the CCPA contains other hurdles which we will explain in another post).
The legal bases required by the GDPR are detailed in Art. 6 GDPR and companies which process personal data of data subjects may only do so after having ensured the lawfulness of processing by identifying a legal basis. The most common ones are the consent (Art. 6 (1) 1 lit. a GDPR), the necessity to perform a contract with the data subject (Art. 6 (1) 1 lit. b GDPR) and the weighing of interests (Art. 6 (1) 1 lit. f GDPR).
Being responsible for the lawful processing of personal data is only one of many obligations which the GDPR imposes on companies and organisations. There are many more, like for example obligations relating to transparency and the provision of information, information security and deletion of personal data. All of these obligations are triggered by the processing of personal data and, in the end, compliance with the GDPR requires due care to ensure privacy compliance. This helps to avoid fines, data subject requests, investigations and of course negative press.
Applying GDPR to Machine Learning
In order to further explain the limitations that the GDPR imposes on machine learning scenarios, let us have a look at an example:
An e-commerce company provides an app to its consumer customers which enables them to order clothes. Needless to say, the usage of the app creates much data which is of high value for the company, like data about the ordered and returned clothes or data resulting from the usage of the app. In particular, the latter is quite important since it allows the company to suggest to the user other clothes which likely fit and suit the respective customer.
What the company would now like to do is to a) log all such data, b) transfer all such data to a central server, c) mix such data with data of other customers, and d) train a machine learning model based on such data. With the help of such an AI model, the company would then be able to perform operations necessary to suggest products to the user.
Regardless of the technical issues the company might face, which in particular result from the bandwidth limitations that an internet connection imposes on the transfer of the raw data, the company’s inhouse lawyers will have to find a solution for GDPR compliance. Let us recall that any processing of personal data requires a legal basis. There are different processing activities which have to be considered here, in particular a) the collection of the data in the app, b) the transfer of the data to the central server of the company and c) the training with the data.
None of those operations can be justified with Art. 6 (1) 1 lit. b GDPR because none of those operations is necessary to perform a contract between the company and the user. This legal basis would, for example, cover the ordering process, but not the analysis of the user’s behaviour. A consent could of course be sought, but despite the fact that it needs to be freely given, specific and informed (Art. 4 Nr. 11 GDPR), it can be withdrawn at any time (Art. 7 (3) GDPR) and in practice, the majority of users will not provide it.
The remaining plausible legal basis is Art. 6 (1) 1 lit. f GDPR, which requires that the company’s interest in processing the data prevails over the interests of its customers. This requires an in-depth analysis which considers all relevant circumstances, in particular the quality and quantity of data, the purpose for which the company intends to use the data and also the environment in which the data is processed.
Considering that profiling is of particular significance (Recital 71 of the GDPR), the company’s inhouse lawyers would need to apply utmost care to the legal assessment and will in particular need to find reasons why the company’s interests for transferring the data from the users’ end devices to a central server and mixing this data with the data of other users is of particular (and prevailing) interest for the company. The legal reasoning for this is quite difficult in practice. Even in case that a company should have a legitimate interest, it would need to find reasons why its interests prevails over the interests of the users.
Federated Learning: Bringing the Algorithm to the Data
Inhouse lawyers get grey hair when they are asked to grant permission for such operations. Their lives can be made much easier if we change one tiny aspect in the example above: Let us not transfer the users’ personal data and profiles to a company’s central algorithm server, but let us transfer the algorithm to the users’ data! In that case, a user’s data stays on the user’s end device and will train a machine learning model there. After the training, only the model is transferred centrally, not the raw data. Since a machine learning model is typically considered anonymous in terms of the GDPR (we will explain the reason for this in another post), there is no legal justification necessary for the transfer of the model to the company. The GDPR simply does not apply!
Creating anonymous machine learning models without losing any information in the users’ data is one key aspect of XAIN’s privacy-enhancing technologies. This way, your inhouse lawyer is happy, while you can pursue commercial AI innovations and products without limits.
© Photo by zoe pappas from Pexels