India’s Non Personal Data Report Suffers from a noble vision

Shyam Krishnakumar
The InTech Dispatch
6 min readAug 17, 2020

The Robin Hood inspired report aims to rob the data-rich companies and share the wealth with, er, other companies. And the state. No, seriously.

Illustration by Titiksha Vashist

When the Sheriff of Nottingham sought to steal the lands and wealth of poor peasants, Robin Hood and his band of merry men came to the rescue. These stories came back powerfully as I read through the Non Personal Data Report. Sounds strange? Read on for a bit and you’ll see why.

The Non Personal Data Report by the Kris Gopalakrishnan Committee can shape the future of India’s digital economy as non-personal data, is central to powering algorithmic learning and decision-making. However, the report suffers from a noble vision. Pervading the report is the notion of “unlocking” Non Personal Data(NPD) held by a few dominant enterprises to unleash a wave of startups that innovate through combining datasets.

Simply put, it seeks to commodify NPD and make it a free shareable resource on top of which a competitive ecosystem flourishes. It also seeks to make available data to government for social and national purposes. What are those exactly? Well, we are not sure either. Our Robin Hood aims to rob from Companies to share the data wealth freely with, well, er, smaller companies. Oh, and the government.

So what exactly is non-personal data?

Your guess is probably right. Data unrelated to a person including data on natural phenomena, industrial data, and data from public infrastructures is Non Personal Data. Anonymised personal data also qualifies as Non Personal Data. The report divides Non Personal Data into three categories:

1) Public Non-Personal Data: Data generated by government or publicly funded works including public research reports, public health information, vehicular registration data, etc.

2) Community Non-Personal Data: Community NPD is any raw unprocessed non-personal data related to a community. The ambit is wide enough to include Malayalis, gig workers, and gamers as communities. Examples include data from public electric utilities and usage information collected by ride-sharing companies.

3) Private Non-Personal Data: Data produced by private players derived from applying proprietary knowledge.

Of Data and Trees

Who owns the timber of Sherwood Forest? The community of course! The report literally uses this analogy of economic rights of natural resources belonging to the community to argue that Community Non-Personal Data belongs to the community. The community trustee should be able to control how this data should be used.

Well, what’s the problem with this? Well, Data is not trees.

The report assumes data just ‘exists’ in these communities and all companies extract it like firewood from a forest. Data, however, comes into being in the interaction between the collector, the data person, and the process of collection. Beyond basic demographics, every data collector builds a different data picture of the same subject, based on how they define categories, what they measure and so on. Simply put, if you request your data profile from multiple different providers, you would get very different pictures based on what they capture, how they define categories and how they measure.

While communities should undoubtedly get significant benefits for being contributors, the analogy of harvesting firewood does little to recognise the role of organizations in shaping and contributing to data-creation. So what is the problem with this idea? If we translate it completely, organizations will have little incentive to collect firewood, er, data.

Thou Shalt Share. Or Else.

The report says that Data Businesses, organizations that collect data beyond a “threshold level” must disclose what data they collect and process and for what purposes. The information about datasets present with all organizations will be stored in digital meta-data directories that are open to Indian Citizens and Indian organizations.

These meta-data directories are aimed at creating a nationwide marketplace of data, reducing information asymmetry and encouraging the creation of businesses that combine existing datasets to create higher value. So a startup could combine public traffic data and sensor data requested from automobile companies to create an app that gives the safest routes. In short, the aim is to make NPD a commodity, freely and cheaply available.

Sounds good, right? Except that it won’t actually be a marketplace driven by demand and supply. The Robin Hood Syndrome strikes again. Organizations must share raw factual data for free. In the case of processed data, organizations could still be mandated to share it on a Fair, Reasonable and Non-Discriminatory basis. Only with still higher processing a data-market finally begins to function and market rates kick in. The report assumes that the value creation in data emerges only from processing. If Ola were to share its raw anonymised ride data would it it not lose value?

Rob from the rich and give it to…the rich

If a data-holding entity shares data for no remuneration on receiving a request, perfect. Everyone is happy. If the data-holding entity refuses, the request is sent to the Non-Personal Data Authority. The Authority evaluates it to see if it furthers:

  1. Sovereign Purpose: Furthers national security, legal purposes, etc.
  2. Core Public Interest Purpose: For public goods, research and innovation, policymaking, better delivery of public services, etc.
  3. Economic Purpose: To encourage competition and provide a level-playing field or “encourage innovation through startup activities”.

If the request enables any of these three, then the Non-Personal Data Authority mandates data sharing. For free.

Now, dear reader, pause for a minute and think. Can you think of any data request that does not fall under Sovereign, Public Interest and Economic Purposes? These categories are so broad and so ambiguous( what cannot “encourage innovation” be “national security”) that they can encompass every conceivable use-case under the sun. Which means, any entity can demand the data collected by any company, no matter the cost and effort involved in collecting, no matter the consequences for the firm, for free. Robin Hood now robs from the rich and gives it to, err, the rich.

Can the incorruptible Robin Hood become biased and arbitrary?

The Personal Data Protection Bill proposes setting up of the Data Protection Authority as a regulator. This report argues that NPD requires a separate regulator to supervise and mandate data sharing. Now companies will have to deal with two data regulators. Who might give contradictory instructions.

But wait, this new regulator isn’t just a regulator. It has two roles:

  • Enforcing Role: To ensure stakeholders follow regulations laid and provide data when data requests are made.
  • Enabling role: To enable data sharing for sovereign, social welfare, economic welfare to spur innovation, economic growth and social well-being.

What happens when an appeal to share data goes to a regulator which is mandated to enable data sharing? One can easily guess. There is a clear clash of interest.

In a robust NPD market, well not a market exactly, tens of thousands of data requests will be generated per day. How many companies will share their data willingly? This will create a tremendous backlog with the regulator. If there are thousands of requests pending, which ones get cleared first? Simply put, the report invests too much power in an entity with very little capacity. Further, the broad discretionary scope opens possibilities of regulatory capture.

The history of Indian Regulators clearly shows us the perils of investing a vast mandate and arbitrary discretionary power in a regulator with limited capacity. In the 1970’s it led to the Inspector Raj. Even today, businesses are burdened with hundreds of regulatory compliances that are arbitrarily enforced for rent-seeking purposes. While a regulatory architecture for Non Personal Data that protects citizen data rights and enables innovation is indeed necessary, what was needed was an effective regulator, with limited discretionary power. Instead, the report has given us a Robin Hood regulator with an outsized mandate and arbitrary power to bully companies into sharing their data for free.
Welcome to digital data socialism, desi style.

Like what you are reading?

We write on emerging tech, politics, culture and us with an Indian focus every fortnight. Subscribe for free : www.bit.ly/IntechDispatch.

--

--

Shyam Krishnakumar
The InTech Dispatch

I work at the intersection of Emerging Tech, Public Policy, Culture and Us.