Responsible management of Non-PII: A case for data stewardship
Aditi Ramesh and Astha Kapoor
The rise of social networks, data sharing agreements, and use of data for business/innovation has us generating more data than ever before. As a result of these complex interactions, details of ownership and control of this data have become more unclear. Digital exhaust, described below, is an example of data that falls into this category:
“As companies and organizations go about their business and interact with individuals, they are generating a tremendous amount of “digital exhaust” data, i.e., data that is created as a by-product of other activities.”
A large portion of digital exhaust constitutes non-personally identifiable information (non-PII), or data that does not reveal personal information. Non-PII may be best managed and used to generate societal good through models of data stewardship. At Aapti, our research explores the complexities of regulating non-PII, and makes a case for stewardship, a mechanism to use data to generate value while safeguarding individual rights, as a possible solution. Data stewardship structures data flows and data sharing according to certain defined incentive structures — thereby creating models that promote societal good. The steward has a responsibility towards the user, or those whose data it is, and works to protect their rights. This can be done through the implementation of technological and legal standards, based on the purpose of data use. For example, companies such as Uber and Ola generate vast amounts of mobility data; this data has potential to inform solutions for public issues such as pollution and traffic management.
Non-PII, however, is subject to controversy for a few reasons. First, its definition has various interpretations; non-PII holds information that may pertain to an individual but cannot be traced to his/her identity or contact. The definition of non-PII can be highly context dependent. For example, many states in the US do not consider IP addresses as PII, but some, like California, do. Clear, global definitions of these categories have yet to be established. Second, the ownership of this data can be unclear. Non-PII is often generated from an individual’s interaction with digital platforms. These platforms can determine the purpose and processing of personal and non-personal data and tend to stake their claim on it. Users, often by design, lack control over the use of this data — once they consent to the terms of a service, often have no viable recourse against any violation of the agreement or of applicable laws. Third, some cases have shown that non-PII can turn into PII if not managed appropriately, thus not making them fixed categories. For example, in 2015, MIT Scientist Yves-Alexandre De Montjoye showed that anonymized credit card transactions, scrubbed of all PII, could be traced back to identify an individual’s transactions. If no- PII data is indeed able to link back to an individual, it is imperative that the data is handled with greater thought.
Given that most technology companies now have data-intensive business models, access to large sets of unclaimed data is profitable, and can lead to competitive advantage in servicing and acquiring customers. For this, non-PII that is unregulated in many countries is useful fodder. For example, purchasing data is useful to e-commerce companies that use it to make decisions on what products to prioritize on their platforms.
India, like the rest of the world, has seen a growing awareness amongst as well as some response from policy-makers on questions of personal data and privacy. PII is soon to be regulated in India under the personal data protection act, which deals with the processing of personal data of individuals by governments and private entities. Non-PII, however, is invariably aggregated — and as a result, harder to regulate. A draft of the Indian e-commerce policy from the Department for Promotion of Internal Trade has recently also made a case for the sensitive management of community data:
“Who should own farming data like about land/soil, climate, farming practices, etc — farmers collectively, or whoever collects it? All such data has no protection or claims under privacy frameworks.”
The 2019 Economic Survey compared data to a natural resource that belongs to the country, and may be utilized for economic benefit. In addition to societal good and economic value, there is also value in governing non-PII for several other reasons such as competitiveness in the digital economy by ensuring that big tech does not monopolise and hoard data to extract the aforementioned competitive advantage.
Given this need for clarity as well as the potential value, to ensure safe sharing of data without undermining privacy, models of data stewardship should be explored. Stewardship brings in a party without the interests of a regular data fiduciary, called the data steward, to manage the sharing of data, an intermediary. Several models of stewardship are being considered — collaborations, trusts, account aggregators and personal data stores (later pieces will explore these models in detail).
There are examples of stewardship models in India that use non-PII to channel this public good. IUDX, the Indian Urban Data Exchange, for example, attempts to harness the power of non-PII for public good, by aggregating urban mobility data to develop smart city applications. The intent behind the project is to make urban datasets accessible to a large number of entities to derive value from data that currently exists in silos and not realized to its full potential. Other examples of using non-PII include public environmental datasets, such as the Environmental Information Exchange Network. Recent writing has also shown the potential for social applications of data in Indian context, given we equip data entrepreneurs with the right tools.
The rise of non-PII breeds greater uncertainty around data handling practices — leaving many unanswered questions around access, control, and consent. Stewardship has the potential to play a crucial role in not only creating solutions, but also for leveraging data for societal benefit through collaboration and active identification of use cases. Concentrated efforts should be made to identify use cases for stewardship, and work with stakeholders to build models that can test this suggestion.