The Evolution of “Privacy-Tech” and Data Collaboration

Sidharth Ganesh
4 min readMay 17, 2023

--

Photo by Towfiqu barbhuiya on Unsplash

About this post

There has always been a trade-off between data privacy and utility. The easiest way to keep data private is to keep it encrypted and in silos, however that greatly reduces the utility of data. There are many problems that could potentially be solved with wider availability of data, across healthcare, finance, public safety and so on. Thanks to the influx of investment and interest in blockchain and cryptography, there are now multiple methods to perform computations on data, without revealing its contents. Currently these privacy-preserving computation technologies are utilized to comply with regulations, however in the future they will become key to collaboration between non-trusting parties. These advances might happen quicker than we earlier anticipated, partly because of the investments in blockchain and DeFi, and the value in the use-cases that it unlocks.

This post is broken into four parts:

  • The first section provides the background and introduces a framework for data collaboration
  • The second section provides more detail into different types of data collaboration frameworks: private data collection, data vaults, data clean-rooms and trustless data marketplaces
  • The third section talks about the emerging use cases in marketing, Decentralized Finance (DeFi), healthcare and finance
  • The fourth section talks about the current state of technologies and the investments in the space

Background

Surveys show that an increasing percentage of the US population are concerned and vigilant about user privacy¹. This is the result of multiple factors, including the increasing number of data breaches, regulations such as the GDPR and CCPA, general awareness on intrusive advertising practices, and just the sheer increasing amount of data being collected. An increasing number of brands recognize being ethical about how their data used can be a source of value differentiation².

An increasing percentage of the US population are concerned about their online privacy (Source: Pew Research Center)
Billions of records are lost every year through security breaches (Source: Momentum Cyber Cybersecurity Almanac 2022)

In this light of increasing need to incorporate privacy in the day-to-day operation of digital businesses, a few questions that we try to answer through this article include:

  • How do we allow different stakeholders who hold private data to collaborate together in a privacy-preserving manner?
  • What is the current state of these technologies and who’s building them?
  • Which industries will adopt these technologies?

Enabling collaboration on private data

Data collaboration in this context generally refers to sharing of data between parties that creates value. This could take the form of a mobile app collecting user data, a retailer exchanging data with a brand or a healthcare provider sharing data with a research lab. Depending on the use-case, data collaboration takes place through one of these frameworks, summarized in the table below.

Use-cases in bold indicate the actively pursued ones.
  1. Secure data collection: This is the current state of most consumer applications today. First-party user data is collected, encrypted and sent to a central server where it is decrypted and processed for analysis. Data is encrypted in transit but data breaches on the server may expose private records.
  2. Private data collection: Organizations such as Apple, Google, Amazon who have a lot of devices collect first party data not just securely but also privately. By using methods to add noise systematically, they obscure individual records while allowing aggregate analyses to be carried out. Data is private in transit and storage, and hence breaches do not expose private information.
  3. Data vaults: When organizations need to set up environments where multiple stakeholders need to perform computations on private data, they can do so in a secure environment which allows data computation but doesn’t reveal PII. For eg, a data vault may mask user name and zip code but still reveal user events. You could aggregate user behavior by zip code but not discern individual user zip codes.
  4. Data clean rooms: Data clean rooms allow different teams or entities to share and collaborate on data without revealing each other’s contents. Use of these methods may or may not be combined with private data collection. This is again achieved through use of cryptographic techniques, with added governance on computations that may be allowed on the shared data.
  5. Trustless data platforms: When entities need to collaborate with each other without necessarily trusting the other entity, they do so through decentralized networks. Participants of the network are incentivized to keep the platform trustworthy and do so by ensuring verifiability of data processed alongside cryptographic techniques for private data computation.

In the next section, we’ll talk about each of these frameworks in more detail.

--

--

Sidharth Ganesh

Sidharth writes about technology. He's worked in product roles in multiple high growth consumer startups. He's currently pursuing his MBA from Kellogg.