We Are Big Data: New Technologies and Personal Data Management

19 min readJun 20, 2018

*Published on CYBERLAW by CIJIC edição n.º 5

Introduction

Technology has advanced rapidly and contributed to improve the way we live. In addition to interfering with how individuals act, it changes the way people relate to each other, to business, and to government. The many changes emphasize the need to give importance to the individual and to have a multisectoral dynamics to build a sustainable Internet governance. It is undeniable that new technologies bring benefits. However, there are regulatory and ethical questions related to their use. With more and more connected devices, related to the scenario that has been called Internet of Things (IoT)[1], there are several risks and challenges, such as those related to the right to privacy.

Data generated through the use of these numerous smart devices are collected and stored by companies, which do not always act transparently. Terms of use and service are often extremely technical and unintelligible to the general population. It is not uncommon that the intended purpose of the data be hidden from the users themselves, who have no control over the information that refers to them. Given the voluminous amount of data produced daily, this becomes even more worrisome, especially since the “Big Data” phenomenon goes far beyond a tangle of data, being essentially relational. We must bear in mind that Big Data is us, and therefore we must have a critical conscience about it and think about possibilities to regain control over our personal data.

With ownership and availability of our data, companies use techniques such as targeting, tracking, and profiling to target their marketing policies to the way we live and our needs — or to what they make us believe to be a necessity. In this way, discussions about the right to privacy are inextricably linked to discussions about the use and management of data. The technological advance requires adaptations of the legal order to the new scenarios, which can happen, for example, through the legislative action or the interpretive activity. These solutions are not always effective: on the one hand, the sociopolitical conjuncture and the technological pattern change much more rapidly than legislation can accompany, and, on the other hand, paternalistic and corporative distance from the will of individuals. Thus, new ways to protect the right to privacy and to increase the control that Internet users have about their own data have emerged as an alternative.

In this sense, the MyData project was created. It is basically a system whose objective is to place the individual at the center of personal data so that they themselves have control of the information produced about themselves, being free from the abusive control of data currently exercised by companies. It adopts a perspective centered on the human being, and no longer on the things or the information itself. In the current management model, the data is from those who collect it. Individuals to whom the information refers to, do not even know in general the purpose for which they are used, which creates serious privacy problems and fails to meet the principle of transparency. The new system seeks to create a scenario in which users have their human rights respected in the digital environment and can control their data while creating barriers to business innovation that can develop based on mutual trust.

The present study aims to analyze this project in a more detailed way and seeks to highlight the benefits it can bring to the protection of privacy and the taking of control over personal data by the individuals themselves. To do this, we will first present a brief overview about the right to privacy, its contours and the impact of new technologies. In a second moment, aspects related to Big Data will be analyzed, so that a more delineated notion about the production and storage of data is made. Third, we will present in more detail the personal data management project mentioned above. We conclude with an analysis of how this project tends to contribute to the protection of privacy in the context of new technologies.

The Challenge of Privacy in the Hyperconnected World

The protection of privacy is a fundamental point of societies that are intended to be democratic and is envisaged as a fundamental right in the American Convention on Human Rights (article 11) and in the Universal Declaration of Human Rights (article 12). International treaties on the subject generally deal with privacy in the face of non-interference in family private life, correspondence and communications, as does the Brazilian Federal Constitution of 1988[2]. The interpretation of privacy, however, has been changing substantially in recent years and this right has gained new contours.

The right to privacy consists of a complex value [44] having different meanings and different aspects that characterize it. Among these aspects, we have the traditional view of the right to be left alone [57], which implies control by the individual on information that relates to his or her personal life. [53] The right to privacy involves the right to prevent strangers from accessing information about privacy and not disclosing it. [53] There is also the one which deals with the right to privacy from the perspective of protection with other interferences — which implies the individual’s right to be left alone in order to live his life with a minimum degree of interference -, from the point of view of the secrecy of certain information and, finally, from the point of view of control over information and personal data [26].

With social and technological development, different facets of privacy emerge, and new conflicts and problems erupted [55] [28], such as the debate about the right not to become aware of personal data [36], the discussion on non-authorized biographies [35] and the “right to non-tracking” [30]. In the information society, privacy must be understood in a functional way, in order to assure a subject the possibility of “knowing, controlling, addressing, or interrupting the flow of information related to it” [48]. Accordingly, Stefano Rodotà [48] defines privacy “as the right to maintain control over the information itself”.

There is no final concept for the right to privacy and the notion of private life has been expanded due, among other factors, to the development of technology. The technological factor has a prominent role, since with the improvement of the layer of information storage and communication, new ways of organizing, using and appropriating information arise. Technological development allows for the creation of behavior profiles that can even be confused with the person [15]. Such profiles, combined with the manipulation of data collected, can generate serious impacts on freedom: “Another technique still concerns a data collecting modality, known as data mining. It consists in the search for correlations, recurrences, forms, trends and significant patterns from very large amounts of data, with the aid of statistical and mathematical instruments. Thus, from a large amount of raw and unclassified information, information of potential interest can be identified” [15].

Thus, while, on the one hand, technology brings undeniable benefits to society, it creates, on the other hand, problems for privacy protection. Although technology helps to shape a richer private sphere, it contributes to the increasingly fragile and threatened sphere, which gives rise to the need to continually strengthen its protection [48]. The need for greater protection of personal data goes deep into the Internet of Things scenario. In this context, increasing connectivity with the most diverse technology devices generates a virtually inexhaustible source of information about the day-to-day of users of such devices. Considering that when we speak in private we have personal information in mind [50], it is essential to devote special protection to the data and information generated through Internet connections and devices connected to IoT.

Brazil, unlike most countries in Latin America [3] and Europe [3] [45], does not yet have a sufficient legislative framework to guarantee the protection of privacy at all times[3]. There are bills currently in progress at the National Congress seeking to pass a general law on privacy and personal data protection[4]. However, protection should not only be governed by legislation, since laws are limited in time due to rapid social change. Thus, considering that privacy should also be understood as positive freedom, it is fundamental to create mechanisms that give individuals the power to control their own data, the processes to which they will be subjected to and the purposes underlying their use. One of the possible alternatives for protecting privacy and empowering individuals to control their data is personal data management, which will be presented in more detail below.

We Are Big Data: Between Economic Exploitation and Personal Data Control

Every day, we connect to the Internet through devices that have the ability to share, process, store, and analyze a huge amount of data. This situation generates what we know as Big Data, which is an evolving term that describes any voluminous amount of structured, semi-structured or unstructured data that has the potential to be exploited for information[5] [25].

The first property involving Big Data consists of the increasing volume of data [47]. A recent survey by Cisco [9] estimates that over the next few years the measure in gigabytes (1 trillion bytes) will be exceeded and the amount of data will be calculated in the order zettabyte (1021 bytes) and even in yottabyte (1024 bytes).

Another property involves the high speed [9] with which data is produced, analyzed and visualized. In addition, the variety of data formats represents an additional challenge. This feature is enhanced by the different devices responsible for collecting and producing data in different environments. The information provided by a mechanism that monitors the temperature is quite different from that obtained in social networks, for example. In addition, most of the data found is not structured [9] [34].

The concept of Big Data may also imply, together with the concept of Data Science, the ability to transform raw data into graphs and tables that allow an understanding of the phenomenon to be demonstrated. It is important to mention that, in a context where decisions are increasingly made on the basis of data, it is extremely important to ensure the accuracy of this information [32]. Although this is not a new phenomenon, “what the Internet did was to take a new dimension, transforming it. To understand these transformations, we need to understand that Big Data is us” [52].

The combination of intelligent objects and Big Data can significantly change the way we live [19]. Research [4] estimates that by 2020 the number of interconnected objects will increase from 25 billion to 50 billion intelligent devices. Projections for the impact of this hyperconnection scenario on the economy are impressive, corresponding globally to more than $11 trillion in 2025 [51].

Intelligent and interconnected objects can effectively help us in solving real problems. From the point of view of consumers, the products that today are integrated with the technology of the Internet of things come from the most varied areas and they have diverse functions, from electrical appliances, means of transport to toys. There are also pieces of clothing that have IoT connectivity, being part of a category called wearables. These wearable technologies consist of devices that are connected to each other producing information about the users and the people around them. Among the main products are the bracelets and sneakers that monitor the physical activity of the user, as well as clocks and smart glasses that intend to provide the user with an experience of immersion in the reality itself [24] [12] [38].

However, transforming an analog object into an intelligent one, in addition to making the product expensive and subject to flaws that it would not have a priori, can also create risks in relation to security and privacy [50]. We are talking about a context that involves a massive volume of data being processed, on the scale of billions of data daily, allowing it to be possible to know more and more individuals in their habits, preferences, desires and thus trying to direct their choices. This need has been well explored by the market, which has explored the possibility of personalization and automatic customization of content on digital platforms, including capitalizing on filtering with targeted advertising through cookie tracking and retargeting processes or programmatic (behavioral) media retargeting [40]. There is now no clear treatment of the data [2]. Aspects about the collection, sharing and potential use of them by third parties are still unknown to consumers. This has the power to shake — and, in a sense, already shakes [8] [11] [2] [42] — users’ confidence in connected products [33].

It should also be noted that security holes open space for attacks aimed at accessing the information generated by the devices themselves. In addition, intelligent devices, when invaded, can generate problems not only for the device itself, but also interfere with the network infrastructure itself [10]. Issues related to security and protection of personal data are equally important for IoT to consolidate as the next step on the Internet.

Given this scenario, one of the most important issues related to the protection of personal data is who controls them and who has access to them. In the current model, technology companies are endowed with this control and have such access. The individual in relation to whom the information is collected often is not even aware that his data is being stored and, when he does know, it is not uncommon that he is unaware of the purpose of such collection and storage. A society that aims at being transparent and democratic cannot dispense of clear and fair forms of data management. It is necessary to equip individuals with control of their own data and to empower them to decide what, with whom, when and why to share.

Personal Data Management Project

Online interaction is constant and is present in the lives of almost all individuals. In the hyperconnected contemporary world, information and news gathering is increasingly occurring through the Internet, as is the contracting of products and services, which increasingly occur digitally, as well as the establishment of social and professional contact through social networks. This, however, often goes unnoticed by users, who do not realize the digital traces they produce about themselves. The data produced, not infrequently, is stored for a long period of time. The control of this trail has become a technological and social problem, since from its analysis it is possible to obtain information about the behavior, preferences and personal needs of a certain person and even to predict their future actions.

An example of predicting people’s future actions based on their buying habits, which demonstrates the danger of free use of personal information, is the cross-referencing of data made by sales companies. Target creates an identity of each consumer through information obtained when the customer uses the credit card, a promotional coupon, contact the SAC or visit the online store. The company realized that if a woman buys some items together or in larger quantities, such as unscented lotions, coconut butter lotions, zinc and magnesium supplements, and a large purse, there is an 87 percent chance she is three months pregnant [49] [46]. An interesting case occurred in 2012, when the company delivered discount coupons to a woman, but her father received them instead, receiving the surprising news that his daughter was pregnant [16].

Despite this collection of Big Data about individuals and the formation of individual profiles, individuals do not usually have access to the personal data about them generated. Large Internet companies, such as Google and Facebook, centralize the collected information and encourage people to use only their tools, since there is no sharing of information between them, which is in line with the competition in the market and the innovation. The user does not control his personal data [54]. One of the recently proposed technical solutions to this problem points to personal data centered on the human being, that is, individuals themselves should control their data.

Personal data has an increasingly significant social, economic and practical value, but its application and wider use is often confused with negative predictions of a future devoid of individual privacy. MyData consists of a human-centered (other than the current organizational system) and rights-based framework for data management. Individuals must be at the heart of their own data control and their digital human rights must be strengthened while companies are able to develop innovative services based on mutual trust. [43] MyData allows the collection and use of personal data in order to maximize the benefits obtained while minimizing lost privacy. Thus, these valuable data will enable individuals to interact with vendors, who can offer better data and consumer services [43].

This MyData-based, interoperable infrastructure approach provides individuals with data-based services with greater privacy and transparency, which enhances freedom of choice both empowering and benefitting the individual. Consent management is the main mechanism for enabling and enforcing the legal use of data. In this model, consents are dynamic, easy to understand, machine-readable, paired and coordinated. A common format will allow each individual to delegate the processing of data to third parties or reuse the use of data in new ways [43].

MyData equips individuals to control who uses their personal data, estimating what purposes may be used and giving informed consent in accordance with personal data protection regulations. Data flows become more transparent, comprehensive and manageable. Users can also turn off information flows and withdraw consent. Finally, machine readable consents can be viewed, compared and processed automatically [43].

In addition, MyData can be considered useful to companies because it will help integrate complementary third-party services into their core services; will simplify operations within current and future regulatory frameworks and allow the use of data for exploratory purposes; and will enable the creation of new business based on data processing and management [43].

It’s interesting to note that MyData is complementary to Big Data, and vice versa, because without addressing the human perspective, many of Big Data’s’ innovative potential uses are incompatible with the regulations currently in place.

This approach has three principles that require maturation: (i) control over data centered on the human being: the human being is an active actor in managing his / her life online and offline and “has the right to access his / her personal data and control his / her privacy settings” [5] as much as is necessary to make them effective; (ii) usable data: personal data must be technically easy to access and readable by Application Programming Interfaces (APIs). MyData converts data into a reusable resource to create services that help individuals manage their lives; (iii) open business environment: infrastructure enables the de-centralized management of personal data, enhances interoperability, facilitates compliance of companies with data protection regulations, and enables individuals to switch service providers without data blocking. Thus, “by meeting a common set of personal data standards, businesses and services allow people to exercise freedom of choice between interoperable services,” preventing people from having their data locked into “per- only one company because they cannot export them” and take them to another provider [5].

MyData is a more robust infrastructure than simple APIs. The data aggregator being used today is naturally evolving out of the API economy, but it has significant disadvantages: the lack of interoperability between data aggregators and the fact that the current source of aggregators does not necessarily recognize privacy or engages in a transparent relationship with individuals. Adopting the MyData approach can lead to a systemic simplification of the personal data ecosystem, and this simplification can be done gradually, as the platform can be developed and deployed in stages, alongside the evolution of the API economy and the model of data aggregator [43].

Finally, it is interesting to see how the MyData architecture works, which is based on interoperable, standardized accounts: “The model provides individuals with an easy way to control their personal data from a single place, even if data is created, stored, and hundreds of different services. For developers, the model facilitates data access and removes dependency on specific data aggregators. Accounts will usually be provided by organizations that act as MyData operators. For organizations or individuals willing to be operator-independent, it will also be technically possible to host individual accounts, just as some people currently choose to host their own e-mail servers “[43].

The interoperability is the main advantage provided by MyData, but it is also the main challenge because it requires more standardization, more reliable networks and data formats. In the MyData architecture, data flows from a data source to a service or application. The main function of a MyData account is to enable consent management. APIs allow interaction between data sources and users [43]. As already mentioned, the standardized architecture makes the accounts interoperable and allows individuals to switch easily from operators.

Final Considerations: Personal Data Management as an Alternative to Protect Privacy

The current model by which personal data are managed goes against the right to privacy and transparency, reducing the power of individual choice. The terms of use of online services offered by companies are long enough to discourage users from reading and have technical terms that are not intelligible to the population without specific technological knowledge [5]. The same goes for privacy policies.

Research conducted in 2017 [39] involving 543 participants, showed that 74% of users do not read privacy policies and those who do, spend an average of only 74 seconds on this task. The average time taken to read the terms of service is 51 seconds. For McDonald and Cranor [31], privacy policy reading time is a form of payment. Reading all policies would take 201 hours a year and would be $3,534 per year for each American user. From a national perspective, reading these policies would mean that the time spent would be about $781 billion per year.

People are unaware of the value of their data and, most of the time, do not want to deal with the complication of managing them [13]. As a result, companies use the data in the form they find most interesting, which may involve the sale and transfer of information to third parties, increasing the risk of leakage and thus privacy breach. The fact that data are non-rivals, that is, they can be used at the same time by more than one person or algorithm, creates complications, such as to give them a different destination from the one to which the user has expressed consent. In this scenario, the data belongs to those who collect them, not the person they refer to.

Researchers at the Getulio Vargas Foundation’s Technology and Society Center conducted a study comparing 50 terms of use and service from online platforms analyzing how they deal with the rights to freedom of expression, privacy and due process. The authors concluded that, under this view, the terms are deficient. The main objective of companies who adopt them is to “minimize exposure to liability, rather than detail their obligation to ensure respect for certain rights,” [56] which explains both the vague and ambiguous terminology applied and the tendency for users to have access to as little information as possible, particularly on issues crucial to the protection of human rights “[56]. The study showed, for example, that 62% of companies have clauses requiring users’ consent for the sharing of data for commercial purposes [56], which leads us to question whether the consent given by the user is effectively informed.

Issues of privacy and data management on the part of companies lead us to understand that the currently existing consent model has failed. By this model, personal data has become a currency that can be used by individuals to access content online. In other words, to enjoy a service and not be excluded from its use, the individual consents to the access, processing and disclosure of personal data [5].

The ineffectiveness of the terms of service and the lack of informed consent are even clearer in the Internet of Things. Unisys 2017 Research Security involved citizens from 13 countries and showed that Brazilians are most willing to provide their personal data in return for the convenience of connectivity between their devices. As an example, 88% of Brazilians are in favor of placing sensors in their luggage to communicate with the airport system and have their items located more easily; 83% accept that health information obtained through pacemakers, among other devices, is shared with physicians; and 50% agree to provide health insurance companies with information related to the physical activities of watches.

The great interest of companies in personal data is mainly due to their economic utility, so that in the present century they are equivalent to what oil meant in the last century [41] [23] [22] [13]. In addition, the data is transported to thousands of computers that extract certain values, such as patterns, predictions and other insights into individuals’ digital information — which can be used in marketing policies and artificial intelligence mechanisms.” [13]. Digital information comes from different sources and is extracted, refined, valued, bought and sold in different ways. This changes the rules of the market and demands a new regulatory approach [13]. Individuals must have control over their data and be aware of the fate that will be given to them after authorization for use, which, among other benefits, will increase users’ freedom of choice and empower them. Moreover, it is necessary to face the challenge of getting people to understand the value of their data and that they are entitled to compensation for the granting of information [13].

User confidence in the regulation of privacy and freedom of information is intimately connected to democracy [14], and the digital economy is dependent on that trust. Privacy and innovation do not have to be different. The task of developing an infrastructure in which these two elements converge is difficult and requires high levels of dedication. However, the task, which is not impossible, is essential: privacy demands the highest level of innovation [8]. It is necessary that privacy and innovation move together, so that they do not clash and that one does not disturb the evolution of the other. They can and should go in parallel, and this is what the public expects and what the Law demands [14].

In view of these changing needs, the above project has been developed to give the individual the power over their information and to make them the owners of their data — not the companies that collect them. Projects of this bias may be the solution to overcome an Internet dominated by oligopolies, profiling techniques and generalized surveillance [1].

The MyData project starts from the current context of data management, which is harmful to privacy and transparency, and seeks to empower individuals by giving them control over their own data. We are in constant digital interaction and leave traces with every click that we make. Most of these interactions are stored for a long time, which creates a digital history of people and allows you to analyze their behaviors, preferences, needs and even predict future actions. In general, this data is not available to the users themselves and they do not even know what information is being collected and stored. Individuals do not control their own data — companies do. Therefore, the project aims to get people to control their data and decide, based on clear information and the useful organization of their data if they want to hire a particular product or service.

The system being developed has its central vision focused on being human, but it is also useful to companies, which can create products and services more profitable to the individuals. One point that also deserves mention is the fact that the project is not limited to proposing a data meeting in a single place but presents a model through which individuals can understand and organize their data, in order to obtain the information contained in the systems. However, adherence to this approach is still embryonic. Big companies connected to technology and data management, such as Facebook and Google, are not interested in advancing projects like this, as this is extremely disruptive to their business models. Faced with this, along with the greater dissemination of this type of project, it is necessary to think of ways to make users aware of the value and importance of their data and to know that they can have control over them, defining who will use them, when and for what.

The Internet has given a new dimension to personal information and privacy and has generated what we know as Big Data, which goes far beyond innocuous data: Big Data is us. It is from the recognition of the importance of our data and the development of safe projects that give the individual control over their information that we can ensure effective protection of privacy concerning new technologies.