GDPR for Engineers: What is Personal Data

After spending quite a lot of time to work on GDPR compliance, I thought it would be useful to write a short post about what “Personal Data” means for Engineers.

Before looking into the definitions, I would stress 2 things:

  • Personal Data is probably not what you would assume without carefully reading the GDPR definition
  • Having Personal Data in your servers and databases is not a sin (and probably you will not be able to avoid it). You just have to make sure you handle that data in the right way, in compliance with GDPR.

Definition of Personal Data

Let’s see the law:

‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

Reference: https://gdpr-info.eu/art-4-gdpr/.

and take piece by piece:

any information relating to an identified or identifiable natural person

So any data that can be traced or linked back to a natural person. This already covers 2 types of data: one that could be used to identify a natural person [1], and another that is related to it [2].

What kind of data can be used to identify a natural person [1]?

Most obvious cases are full name, full address, national ID number, passport number, email address. What may be less obvious is: IP address, cookie ID. Another thing to consider is that a combination of different data fields may also good enough (bad enough?) to identify an individual. Remember this game?

It is enough to know that his hair color is blond and he is wearing a hat — and you will know it is Eric. Practically this means that Date of birth, ZIP code and gender is enough to make someone identifiable (source: https://iapp.org/news/a/top-10-operational-impacts-of-the-gdpr-part-8-pseudonymization/). GDPR calls this “singling out”.

What is data related to a natural person [2]?

Anything that could be traced back or joined to any of the above data fields described in [1]. Think like the following: Load all of your system’s data into a SQL database. By all, I mean: operational databases, data warehouses, queues, log files, HTTP access logs, secret keys, hash salts, data on the employees workstations. Now anything is a related information that you may be able to LEFTJOIN to a user’s identifiable data [1] on unique data fields. Okay this may not be a perfect definition but probably a good starting point.

Worth to mention that connections between data entries really count. A data entry in itself may not be a Personal Data, but if you can link it to a personal record, then it is. For example, you might have a database which has records in a <cityID, cityName> format. For example, <426, “New York”>. In itself this is not a personal data since I cannot trace this data back to a natural person. However, if the database schema is <userID, cityName> for the same data entry, then this is a personal data assuming that there is at least one record in the organization that links the userID 426 to a person’s identifyable data [1].

FAQ

Some common thoughts

I only store userIDs in my database, no name, email or IP address so this is not personal data

This is not true as long as the whole organization, or even another organization under your behalf can link the userID to a person’s identifiable data.

I hash the userID in my database so it is not personal data

As long as the hashed ID can be relinked to the original userID, this is not true. So if the userID and hash salt is stored somewhere within the organization (the latter may be in source code, or deployed config file), it is still personal data.

What to do

Probably the best you can do is to make sure you handle personal data in compliance with GDPR. This means a couple of things, such as data lifecycle management, purpose mapping, consents, manage Data Subject rights and — last, but not least — proper securing of the Personal Data you store and process.

Anonymization of the personal data may also be an option — but then you will loose valuable information, ie. linking any future data records of the same user to the anonymized data records.