Data Ownership, Are We Ready?

Mike Wasyl
9 min readJan 9, 2020

A primer on inverse privacy and some examples of how personal data ownership can spark new business.

Owning your share in the digital economy means owning your data, full-stop. The ethos driving Bitcoin and similar distributed technologies is just the start of a greater movement toward ownership optionality of digital assets of all kinds, including private data. In the below, I will briefly cover the state of user data ownership and how we might expect tech companies to explore offering expanded access. Furthermore, I will provide some examples of potential business models arising as users secure their private data.

“Data is now a form of capital, on the same level as financial capital in terms of generating new digital products and services.” MIT Technology Review

Progress to a more data-rich world has created tension not simply because data is personally revealing but because personal data is a capital extraction that is created and not easily controlled, valued, and attributed.

Personal data is not neutral. Personal data is used to target and control decision making and is not limited in natural form. It is tangible and can be measured where recorded. Unlike conventional money or another commodity, data is non-rivalrous and can be used simultaneously by software, algorithms, or applications. Data is, as MIT Technology Review explains, a non-fungible experience good. As tech giants push for more and more relevant sets of personal data, collection becomes more invasive. From IoT to social, information about our location, health, wealth, activity, and security are all extracted, sequestered, processed, monetized, and sold. With evidence pointing to data as a capital asset, how do data originators benefit? Do we just trust companies to use our personal data judiciously? To establish why there should be a concern here, beyond exposure, hacking, or theft, it is useful to look at the concept of “inverse privacy” as discussed by Microsoft researchers Gurivech, Hudis, and Wing in their 2014 paper on the subject.

The Inverse Privacy Problem

Inverse privacy can be defined as personal data accessible to some third party but not the user. The researchers note that over the past decades, those collecting data have had a stark technological advantage over the “regular person.” As such, these personal data outputs are kept out of reach of the user, yet fully accessible in the living memory of the company. Inverse privacy has far outpaced partially private or fully private data due to many factors including the advancement of database technology over the last several decades (5). While a natural monopoly on recorded data would seem to be beneficial for a technology company for means of monetization, Gurivech, Hudis, and Wing argue that access to inversely private data is beneficial, has negligible risk for numerous scenarios, and should be made convenient to access. For instance, access to inversely private data could provide insight into areas such as credit and personal health, and generally open new business opportunities (11). At the root, the argument is that the negative consequences of providing access to inversely private data are far outweighed by the positive effects of opening the vault. Furthermore, the researchers argue that the inverse privacy problem is “primarily a product of technological progress” that can and will be solved with better technology (9). To this end, it is clear that we will require of a new interface, one which can provide certainty around user identity credentials, private sharing, differential privacy, multi-party computation, and custody provisioning.

So what is “personal information” in this context and how should we look at it? Personal information may be seen as a collection of personal data items (infons) that together make up an organized personal infoset. The infoset can be framed by the invasiveness it has. That is to say, how an infon is used, from whom it comes, and to what extent exposure to or knowledge of the infon leads to potential harm or breach of privacy of some other party.

This may be best explained through requisites provided by Microsoft:

We say that an infon x is personal to an individual P if (a) x is related to an interaction between P and an institution and (b) x identifies P

A piece of information is considered personal if an interaction in which x is involved somehow identifies or makes vulnerable the individual P in question. Expanding on the dimensions provided, we must also consider how such infons are related and what derived information is actually risky to release back to users. While not exhaustive, some of the below questions can be asked to gut-check the “reasonableness” of exposure.

a) Does the infon contain any relation to another individual such that it precludes sharing with the original individual?

b) Does the custodian of the infon serve to benefit materially or economically by having custody of it?

c) Does the infon directly or indirectly reveal sensitive data related to proprietary competitive advantages of the firm?

d) Does the infon reveal any data that would be threatening to national security?

Ideally, users could control which third parties have access, for how long, for what purpose, and who may control their infosets on their behalf.

With new technological advancements in information usage, a new paradigm of data ownership can emerge centered on scrutinizing the conditions under which organized personal infosets are revealed, released, monetized, and autonomously controlled. While some may promote blockchain for this use, it is only part of the equation. The technology does a great job at trustless validation of data and smart contracts can be used for conditional permissioning. However, there must be use of complementary privacy preserving technology to ensure a) user identity is linked with private data, and b) firms do not have the right to see or use all private data unless specifically permissioned to do so. This level of control could include access permissioning or use, revoking access or use, and delegation control for third parties.

Unfortunately, the level of control described above is not the status quo. Microsoft researchers claim some reasonable justifications, which are attributed to competition and national security, citing reports including the 2012 FTC report on protecting consumer privacy. What is clear however, is that the inverse privacy problem can and will be solved. In fact, the researchers claim that access to inversely private data should improve conditions for security, hacking resistance, and other vulnerabilities that regulators and users may find appealing (8).

So why isn’t access to our inversely private data the norm? First, there isn’t a universal governance protocol for how user data is controlled. Even in the shadow of Facebook’s massive fine, tech giants continue to generate billions in revenue using non-transparent user ad profiles and borderline deceptive tactics. Despite the penalties, there is little evidence indicating any change to ad-based revenue models and most big tech firms operate by mining a large pile of data, which is no secret. Despite Google’s exemplary academic research on privacy and differential privacy, balancing privacy with business needs is a challenge they haven’t quite mastered.

Second, technologies that exists to support unique portable identity (blockchain) and privacy for secure computation (zero-knowledge proofs and multi-party computation) have not reached mass acceptance. At the time of this writing, there are several competing layer 1 blockchain protocols, a few tangible projects with ZKPs in use and a small handful of reliable MPC solutions including Ligero, Curv, and Unbound. We are still in the first inning of privacy preserving technology at scale and the battles are really just beginning.

Third, it is unclear whether users will push for privacy outright despite having concerns. Recent studies suggest highly nuanced preferences for data sharing, especially for ad-based services. Pew data suggests an even harsher reality where the majority of Americans are feeling that their data is out of their control (84%) and very few understand what is being done with their data.

Lastly, very few Americans understand data privacy law and most are concerned about how much data is being collected by companies (79%) and government (64%). Lack of understanding of privacy rights leads to apathy regarding data for service business models.

So why open inversely private data at all? Microsoft researchers point to a few “safe scenarios” in which access to the data provides a high risk to reward for users (7). Some safe data might include shopping or travel habits, and potentially even derived data from these activities. However, it is important to note that any technology that shares back information may make it easier for hackers to identify an individual and such would mandate privacy preserving encryption to lower risk. Despite the risk, by allowing users convenient access to such safe data, firms could actually increase their level of trust with consumers and encourage loyalty.

Which digital platform companies are currently providing access to inversely private data? Well for what we know today, Microsoft is taking the initiative with Project Bali, a test of the ideas presented in the 2014 paper. The concept is as described; provide users with access to their data bank, with some exceptions, and provide them with the ability to monetize said data. At the time of this writing, Project Bali is in private beta and there is no word on whether it will be advanced. Regardless, project Bali seems to be a step in the right direction.

Business Model Potential

If a large company like Microsoft opens their inversely private data for at least partial user control, I’d expect some of their competitors to follow suit. While it’s actually quite hard to imagine tech firms jumping over each other to willingly upend their revenue models, the tides of change in consumer sentiment or consumer protection might force the hand of big tech to preemptively prepare. In this new world where data is controlled by users, real-time permissioning and privacy preservation will be the norm.

We can imagine an entire economy where personal data is mined on behalf of consumers. These third parties compete and are rewarded for how trustworthy and helpful they are to consumers (11). Imagine a new business model formed around data delegates. These delegates could act like digital agents, finding opportunities to monetize your information on your behalf. As they may operate on commission, these delegates would hunt for places where consumer data is demanded. As a result, data selling and buying markets would emerge to provide liquid data exchange. Companies like Ocean Protocol may even eventually provide an infrastructure for exchange and even facilitate decentralized data exchange based on rules or thresholds set by users.

Another potential business model could involve improvements to paid browsing. Upon logging into a secure browser linked up to your digital wallet, advertisers could approach you and place bids on “ride alongs” to track your internet usage and pay you directly for your inputs in real-time. While cookies are common to track users on the internet today, users do not benefit financially and certainly not in perpetuity. In this new model, advertisers or companies would pay to plug into you, learn from you, and benefit from your inputs or past activities made visible to them.

Right now, users will train machine learning models to better understand what a road looks like or where a street light is through CAPTCHA tests. With personal data ownership as the norm, we might see CAPTCHA-like tests morph into very complex games that provide users with a small cut for their play. In fact, I would expect an exponential increase in the speed of improvement to machine learning as a result of more difficult data circumstances. Data will cost more and access won’t be guaranteed so I would anticipate the birth of frontier data mining that will make learning from encrypted data far easier.

Beyond the maturation of privacy preserving technology, it will take a strong regulatory push, a will to be first to market by a leading platform, and a solid demand from users who want to take back their privacy from data-consuming businesses in exchange for free products. So are we ready to own our own data? This is a multi-decade journey. Much more research is required to discuss the consumer trends, Web3 developments, regulation, and new business opportunities that will lead to more data autonomy. Nonetheless, I think users are waking up to the idea that their data is capital and they aren’t going to let big tech bank it forever.

--

--

Mike Wasyl

I write about topics related to fintech, commerce, and anything with a captivating story. Managing Partner at DeerCreek | Former ConsenSys Capital & Bloomberg