GDPR: Threat or Opportunity

Samuel Pouyt
Vaticle
9 min readJan 18, 2018

--

In May 2018, the European Union’s General Data Protection Regulation (GDPR) will take effect. These are sweeping changes to data privacy laws and apply to any company dealing with the personal data of EU subjects. There are a number of core data processing and retention mandates under GDPR, which will generally require major overhauls of current enterprise data practices.

I will not cover every particular detail of GDPR in this post mainly because I am not a lawyer. Instead, I want to discuss the general threat GDPR poses to enterprises and the urgent need to be compliant. As a solution to GDPR-compliance, I have found that the Grakn database proves very useful.

My current role is full stack developer for the European Respiratory Society (ERS). The ERS is an international organisation, based in Switzerland, that brings together physicians, healthcare professionals, scientists and other experts working in respiratory medicine. In this capacity, I have begun to build a Proof of Concept of a system using Grakn to not only fulfil GDPR compliance for our data, but also to improve our user experience across our many websites by providing content personalisation and recommendation. In addition to content recommendations within our websites, I am also intend to use Grakn to provide recommendations for our conferences. This system will build personalised congress itineraries, helping our delegates navigate our two hundreds plus sessions over five days. Ultimately the ERS will be able to provide an overview of pulmonary medicine with its “pulmonary knowledge base”.

Now that you’re aware of my background and the general use case I am building with Grakn, it would be helpful to understand a bit more about what GDPR is and why compliance is essential.

GDPR and Compliance

GDPR is huge paradigm change. If before the GDPR area data regulation was mostly optional and one its most visible impact was in newsletters where the unsubscribe link became mandatory it now requires “privacy by design and by default” (Art. 25). This means that any new application has to be designed around privacy, it cannot be an afterthought and you have to be able to demonstrate that it is in fact the case. The software might even need to be certified as GDPR encourages it:

The Member States, the supervisory authorities, the Board and the Commission shall encourage, in particular at Union level, the establishment of data protection certification mechanisms and of data protection seals and marks, for the purpose of demonstrating compliance with this Regulation of processing operations by controllers and processors. The specific needs of micro, small and medium-sized enterprises shall be taken into account. (Art 42.1)

To be certified and able to showcase a mark or a seal for a product could be a key differentiator in order to obtain new contracts, or loose current clients.

What happens when a company is a small service provider and that this company works with big international companies and that one of those companies is inspected for data compliance as someone complained that it received unsolicited mails that the small service company has sent? The big company may be fined for 4% of their annual income, but they will want their money back. Lawyers will soon be knocking on the door of the small company.

This means that big companies will only be working with companies that are GDPR-compliant as they will not want to risk 4% of their income, thus companies that fail to provide proofs of their compliance might be out of business

Of course, “Privacy by design” does not exclude older softwares. They will need to be adapted in order to comply. That process also needs to be documented, as it is important that a company could prove that they did everything possible to be compliant

Given the size and importance of the EU market, GDPR has serious consequences for companies across the world. Nobody knows exactly what will happen when these regulations take effect. What we know right now are the penalties for non-compliance:

Under GDPR organisations in breach of GDPR can be fined up to 4% of annual global turnover or €20 Million (whichever is greater). This is the maximum fine that can be imposed for the most serious infringements e.g.not having sufficient customer consent to process data or violating the core of Privacy by Design concepts. There is a tiered approach to fines e.g. a company can be fined 2% for not having their records in order (article 28), not notifying the supervising authority and data subject about a breach or not conducting impact assessment. It is important to note that these rules apply to both controllers and processors — meaning ‘clouds’ will not be exempt from GDPR enforcement. (ref: Key Changes)

GDPR is clearly a threat if you do not comply, as the minimal fine is 20M or 4% of the annual global turnover. The 4% fine is applied if, four percent of the global turnover is greater than twenty millions.

Given this, we should first ask what sort of data GDPR covers. The regulator defines personal data as:

Any information related to a natural person or ‘Data Subject’, that can be used to directly or indirectly identify the person. It can be anything from a name, a photo, an email address, bank details, posts on social networking websites, medical information, or a computer IP address. (ref: FAQ)

In order to collect such data, under GDPR a company collecting user data has to clearly ask for consent in a clear way and state what the data will be used for:

The conditions for consent have been strengthened, and companies will no longer be able to use long illegible terms and conditions full of legalese, as the request for consent must be given in an intelligible and easily accessible form, with the purpose for data processing attached to that consent. Consent must be clear and distinguishable from other matters and provided in an intelligible and easily accessible form, using clear and plain language. It must be as easy to withdraw consent as it is to give it.​ (ref: Key Changes)

Moreover, the user also has to have an easy way to delete or to task to delete the data a company owns about him and to revoke access to some consent that was given. The data also need to be portable. Which means that a user can request at any time the data a company holds about him and transfer it to somebody else.

Additionally, one of the core mandates of GDPR is that privacy needs to be by design. All new systems need to have privacy at their core and not added later as an afterthought. All these changes have a huge impact on companies as they have to basically change all their systems that collect user data to be compliant, and they have to make sure that the data of a user that has requested to be deleted is really deleted.

What will happen when GDPR takes effect?

How far will regulators go to enforce GDPR? As noted in the previous section, no one really knows how GDPR compliance will be enforced until some companies are caught for non-compliance and fined. Then we will know how the law is applied. At this juncture, there are still many questions that seem unclear.

Some of these questions are broadly political: for example, what sorts of companies will be the first to be fined? Will ‘examples’ be made to single out certain industries that haven’t been good data privacy practitioners? But beyond these political questions, there are just as many important technical questions.

To give just one example: What about backups? If I have two years of backup and somebody requires that their data be deleted should I modify all the backups? If I keep a “difference table” somewhere, that will re-delete the user if a backup needs to be restored is that user really deleted? As I still have a trace of that user… These questions are endless.

Some may think that they will anonymize data and it will solve all their issues as it will prevent “indirect identification”. Most likely it will not. Consider the following:

De Montjoye and colleagues examined three months of credit card transactions for 1.1 million people, all of which had been scrubbed of any [personally identifiable information]. Still, 90% of the time he managed to identify individuals in the dataset using the date and location of just four of their transactions. By adding knowledge of the price of the transactions, he increased “reidentification” (the academic term for spotting an individual in anonymized data) to 94%. Additionally, women were easier to reidentify than men, and reidentification ability increased with income of the consumer.

Latanya Sweeney found that 87 percent of the population in the United States, 216 million of 248 million, could likely be uniquely identified by their five-digit ZIP code, combined with their gender and date of birth.

This example shows that it is extremely difficult to anonymise data. “Ultimately, the hallmark of both anonymization and pseudonymization is that the data should be nearly impossible to re-identify. This theory, however, has its practical and mathematical limits”. A data point on its own it will be anonymous but when many data points are put together, it might lead to re-identification. Unfortunately, there are no clear guidelines on anonymization by the legislator.

Although no guidelines are available, data anonymization or pseudonymisation is very important for compliance. GDPR describes it as follow:

‘pseudonymisation’ means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person; (Art 4.5)

This pseudonymisation, according to the GDPR itself, gives liberty to the data controller to use or reuse data beyond the explicit given consent by the user. Article 6.4.e is very clear on that point:

the controller shall, in order to ascertain whether processing for another purpose is compatible with the purpose for which the personal data are initially collected, take into account, inter alia:

[…]

e) the existence of appropriate safeguards, which may include encryption or pseudonymisation.

Of course the lack of clear guidelines is a problem as in our age of Big Data and machine learning it will be very difficult to guarantee the pseudonymisation, but if a company can prove that it has taken all reasonable measures to make sure that the data is reused in the best manner, they should be on the safe side. This of course is not a legal advice.

How can we view GDPR as an opportunity?

It seems that GDPR is clearly a threat as companies have to change all their systems and this costs time and money. Companies must also re-evaluate the parties with whom they work. Importantly, companies also have to change the way the whole companies handle data: the change is not only technical. GDPR is about a general approach towards data privacy and protection. In theory, an assistant keeping excel sheets on a computer, could have the whole company facing at least a 20M fine.

I would argue though that, despite its great threat, GDPR is also an opportunity. It is an opportunity to review all the system a company holds and uses, to review the flow of data and pinpoint areas where it could be improved. It is also a legal opportunity as you are given a unique chance to review or break contracts that go against the incoming new law, thus you could go away from bad deals made earlier in the life of the company.

But mostly, I see it as data opportunity. Indeed, to be GDPR-compliant, you have to provide a way to easily delete all the data of a user, to give the user an overview of what the company hold on him. Thus, arguably, the regulators are asking companies to create a user tracking system, as companies need to know everything about users. This has huge potential value. It certainly costs money and resources to put in place, thus we should make the most of it!

So instead of viewing GDPR as an obstacle that must be tackled, we should instead embrace the opportunity to provide a dashboard to users that let them deal with their data and will also let the company know how scattered the data is across systems, such that data can be easily kept track of and deleted if required. That’s the GDPR side of things.

But companies should build on this system and also track their content, add user behaviours — such as what they have read, to what event did they go, what did they buy, what did they comment, what did they click on, imagination is the limit. When you have all this data in one system you can start improving user experience by personalizing it for them. Therefore, as much as GDPR is a threat to enterprises, GDPR also offers companies an opportunity to build knowledge bases from which to reason with data and extract value with recommender systems.

Grakn is a wonderful tool to help companies get there. In the next post, we’ll see some of the specifics in using Grakn for GDPR-compliance.

--

--

Samuel Pouyt
Vaticle

Tech Lead/Software engineer. I am currently working on Legal Technologies and Computational Law. I enjoy opera, philosophy nature and literature.