There are many existing articles in relation to the GDPR concept, but there is a lack of articles about the technical challenges that accompany GDPR compliance. The aim of this post is to suggest a valid approach to overcome the challenge.
The General Data Protection Regulation (GDPR) will come into effect on 25 May 2018 and will change the way companies collect, process and store user data.
Privacy by design and data protection by design are the essential part of GDPR, Privacy by Design means that organizations need to consider privacy at the initial design stages and throughout the complete development process. From 2018, Data Protection will become an integral part of technological development as well as how the product or service is delivered.
As developers and decision-makers, we need to carefully design our system to respond to privacy requirements. This topic is vast, you can gain a better understanding of what privacy by design involves (and best practices) here. In this post, we will try to focus on how to store and guarantee the privacy of the user data.
To start, let’s define our main objective as personal data defenders: Our highest priority task is making an individual unidentified from that data, either on its’ own or when combined with other information, and most important, when dealing with sensitive information, not link this information to any living individual. So we are responsible for care not only to the Personally Identifiable Information (PII).
Driver’s license numbers, credit/debit card account numbers, and social security numbers is well known as sensitive data, but exists many others not very known, such as Racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, health data, genetic data, biometric data, sex life or sexual orientation, past or spent criminal convictions to name a few.
What is important is have in mind that some information are not a sensitive data if not linked to an individual, governments release reports about diseases in an area every year, for example. The same for personal identification, no problem in having a list of address in your database if they are not linking to a living individual(Google have all addresses mapped and with photos).
We can have in our database information about employees, such as salary, the date when it has started and holidays, but those data should not be linking to an identifiable person, the same to customers order, no problem in having data about transactions, however, those registers should not be attributed to a subject.
“The removal of person-related information that could be used for backtracking from, say, patient data to the actual patient.”
When your data is leaked it will be used along with other databases and the combination of, can make the anonymity, identified. Is not an easy job figuring out what to remove from your dataset, but keep in mind that 87% of the U.S. population is uniquely identified by the combination of name, address, ZIP code and birth date.
When anonymization is possible you can remove data or use k-anonymity and l-diversity approaches(these deserve a google search) to tackle the privacy in those cases, however when the data owner is essential for our application we can’t use anonymization, for those we can use encryption, pseudonymization or both
These two kinds of privacy protection is a double-edged sword. Although encryption approach can enhance privacy, it needs for encryption and decryption operations. The efficiency for database operation will reduce and if we encrypt the data before insert in the database we lose the ability of query in the database by the original value and not always we have Transparent Data Encryption available to our database.
An acceptable approach is to remove the relationships, thus preserving any privacy that these relationships may compromise. After removing all sensitive relationship, we will add a pseudo reference to the sensitive data linking to an individual. This approach is a kinda Pseudonymization for the individual ID.
Pseudonymization is a central feature of “data protection by design.”, the word apear several times in the regulation whilst the word encryption appear only 4 times.
- “…implement measures to mitigate those risks, such as encryption.” (P51. (83))
- “…appropriate safeguards, which may include encryption” (P121 (4.e))
- “…including inter alia as appropriate: (a) the pseudonymisation and encryption of personal data.” (P160 (1a))
- “…unintelligible to any person who is not authorised to access it, such as encryption” (P163 (3a))
You can notice that encryption come always as a suggestion and any moment those indicate that Encryption is mandated by GDRP and besides, it gives no real context (Encryption at rest? In transit? Where is it Encrypted? What level of Encryption?
The pseudo reference is a code generated using different techniques, such as hash function, Tokenization, encrypted data and etc. The pseudonym allows tracking back of data to its origins, which distinguishes pseudonymization from anonymization. The additional information necessary to get the data back must be kept separately to ensure non-attribution to an identified or identifiable person.
So is an effective approach to keeping relationships private along with column encryption when we need to get back the data to its origins. To place the private relationships we can replace the usual foreign key for an encrypted identification that only the domain know how to link the entities.
Let’s look at the case where relationships between entities are to be kept private.
The application will keep the strategy in how to encrypt Person based on an encrypted key added to the application in order to encrypt Person entity, the relationship will be possible only from Person domain where it will be able to generate an encrypted code to fetch in different tables for that code.
In this post, we have focused on the problem of link identification. We have proposed an approach for anonymizing the sensitive relationships by generating pseudo reference based on the encryption mechanism. and how it can help the company to enhance its user’s privacy.
Please join the discussion and leave your thoughts about the suggested approach an how you are doing to achieve the GDPR compliance.