Classification of data under LGPD

DP6 Team
DP6 US
Published in
7 min readFeb 7, 2023

In a scenario with a wide range of norms and standards, the LGPD brought together the basic concepts and principles of personal data management, which makes it easier for companies to know their obligations and for consumers to know their rights. Mastering the concepts and principles contained in the LGPD is not just the responsibility of a company’s legal department. Technology teams should know them too, if only superficially, in order to mitigate risks, streamline work and generate value for the company. So, let’s explore one of the essential points of the LGPD, the types of data.

Personal data

The LGPD defines personal data as information related to an identified or identifiable natural person (article 5, item I). What qualifies certain data as personal, thus placing it under the protection of the LGPD? Around the world, laws that protect personal data adopt one of two strategies for this qualification: expansionist or reductionist.

The reductionist strategy submits a smaller amount of data to the scope of the protection under the law. Only information directly related to an identified natural person is considered personal data. This approach requires that the data be directly, precisely, immediately and exactly linked to a specific person, with no doubts as to its ownership. Examples of personal data, for the reductionist strategy, are national identity card numbers, social security numbers and biometrics. However, our data protection law adopted an expansionist strategy by defining personal data as information related to an identified or identifiable natural person. Thus, a broader range of data can be qualified as personal, for example, profession, personal interests, IP address and corporate email.

And what are the practical repercussions of adopting one strategy or the other? The expansionist approach acquires relevance from the moment that businesses start to incorporate technologies that produce, aggregate, analyze and extract value from large sets of information, such as Big Data, Machine Learning and Artificial Intelligence.

An isolated piece of data may not have much value, but several pieces of information together can form a mosaic that will lead us to a specific person. In 2016, for example, British researchers used geographic profiling techniques to try to identify the anonymous artist known as Banksy. None of the information used in the research is linked to a name or face, and so it is not related to an identified person. However, once aggregated, combined, sorted and analyzed, it can make a person identifiable.

To determine whether data is personal, we must also analyze the context in which the data is being treated. In 2016, the European Court of Justice ruled that an IP number was personal data, because it allowed the identification of a person when it was treated by a connection provider in conjunction with other information.

So, when dealing with data, we must ask ourselves: does this data, if added to other information that I have, make a specific person identifiable? If so, we are dealing with personal data.

Anonymized data

What is the opposite of personal data? Anonymized data. The LGPD defines anonymized data as data relating to a holder who cannot be identified by using reasonable technical means available at the time of processing (article 5, item III). According to Bruno Ricardo Bioni, anonymous data can be understood as data that is not able to reveal the identity of a person after going through a process of breaking the bond between the data and its holder.

Bruno Ricardo Bioni goes on to say that although this definition is theoretically perfect, it is problematic in a technological sense, especially in the context of ever-increasing advances in technologies that process data. Today, there is no clear and rigid division between personal data and anonymized data. In fact, we found studies that show how reversing the anonymization process can be simpler than you might think. Arvind Narayanan, Professor of Computer Science at Princeton, demonstrated how the reversal of anonymization was possible using other data, as can be seen here.

The LGPD defines anonymization as “the use of reasonable technical means available at the time of processing, through which data loses the possibility of association, directly or indirectly, with an individual” (article 5, item XI). Anonymization is a process that relies on various techniques, such as suppression, generalization, randomization and pseudonymization. Using these techniques, the data ceases to be personal or sensitive and becomes anonymous.

A social security number can be suppressed so that it does not appear in a database. A name can be generalized to contain only the first name, which would still allow for personalization in email marketing. A zip code can be generalized by providing only the first 5 numbers. An age can be generalized to be framed in an age group, in order to avoid individualization. These simple measures are enough to guarantee security for the subjects of the data and the entities that are processing the data.

Previously we highlighted that the LGPD, when conceptualizing anonymized data, speaks of “the use of reasonable technical means available at the time of its treatment”. Article 12 of the LGPD prescribes that “anonymized data will not be considered personal data for the purposes of this Law, except when the anonymization process to which they were submitted is reversed, or when, with reasonable efforts, it can be reversed”.

Therefore, we can understand that if it is necessary to use an unreasonable amount of effort to associate anonymized data with a person, this data will not be considered personal data. But what exactly are “reasonable efforts”? This notion, by itself, is quite broad, and for this reason the law provides some objective parameters to help us understand what constitutes reasonable efforts: 1) cost 2) time and 3) state of the art.

If the reversal of a certain anonymization process results in high financial costs, it goes beyond the idea of reasonable effort. Likewise, if it is necessary to employ a large set of computers and processors to treat, decrypt and cross-reference information for a period of one year or two years, does it fall within the scope of reasonable effort? Evidently not.

Finally, the current standard of available technologies for the reversal of a given process must be analyzed i.e. if a technology is considered to be the most advanced of its kind, or “state of the art”, at the time of its use. Perhaps, in 2 years, depending on advances in computing, some efforts that are unreasonable today will become quite feasible.

This notion of reasonableness, therefore, is circumstantial, depending on the stage of development in which the technology is found.

Pseudonymized data

Pseudonymization replaces the original value of the data with another value, maintaining a relationship with the original value, in order to enable reversal with a token or hash. Unlike the use of anonymized data, which helps to avoid the direct application of the LGPD, there is no specific legal provision for pseudonymized data. Even so, it is possible to imagine advantages in its use by companies.

From an information security point of view, pseudonymization guarantees more security in the treatment of data, and can mitigate the damage caused by possible leaks, especially if the affected data is only non-identifiable.

The chart below, by Jessica Sombrio, exemplifies the distinction between anonymization and pseudonymization. In the words of the author, “(…) applying pseudo-anonymization, the original names of the buyers are hidden, however, it is possible to establish a relationship between them, as the same original values will have the same pseudonyms, so although you do not know who KLAJFB is, you know that this customer made two purchases on the same day”.

Sensitive data

In turn, sensitive data, in accordance with article 5, item II, of the LGPD, is that which is related to ethnic or racial origin, political opinions, philosophical or religious beliefs, union membership, biometric and genetic data, and information about health and sexuality. No company or individual can use this type of information, although there are legal situations that allow exceptions (Article 11 of the LGPD).

Conclusion

After classifying the data that we are dealing with, we can now treat it in the best possible way. Therefore, we can say that this is the essential step for the correct treatment of data. For this, we need to keep in mind the principles that guide and legitimize the processing of personal data (Article 6 of the LGPD) and the legal basis for the processing of personal data (Article 7 of the LGPD).

Profile of the Author: Everson Elias Gonçalves de Oliveira

I joined DP6 in 2019 and fell in love with Data Engineering in this incredible company. I had been working in a law firm in Minas Gerais since 2012 and, in 2022, I started working with innovation in a law firm.

E-mail: everson.elias@dp6.com.br

References

BIONI, Bruno Ricardo. Understanding the concept of anonymization and anonymized data. Available here (Portuguese only).

FAHEL, Ariel. The LGPD has come into force! What now? — A brief introduction to the topic. Available here (Portuguese only).

HAUGE et al. Tagging Banksy: using geographic profiling to investigate a modern art mystery. Taylor & Francis, 2016. Available here.

Banksy unmasked? Scientists use maths and criminology to map artist’s identity. The Guardian, 05 March 2016. Available here.

KELLEHER, Denis. In Breyer decision today, Europe’s highest court rules on definition of personal data. Available here.

NARAYANAN, Arvind. One more re-identification demonstration, and then I’m out. Available here.

NETO, Joaquim. Information security and data protection — Part 1. Available here (Portuguese only).

RIBEIRO, Lucas. Good practices for the use of consent in LGPD. Available here (Portuguese only).

SOMBRIO, Jessica. Anonymization or Pseudonymization? What’s the difference? Available here (Portuguese only).

--

--