Understanding Digital Privacy Tomorrow.
This is the second half of part 4 of 4 of a series of musings on the topic of online privacy. I don’t pretend to resolve the problem, simply exploring facets of the space and pulling at strings that may make the web a more wholesome place to explore and help builders think about the moral valence of their technical decisions. View the first half.
TL;DR — It’s a uniquely exciting time to work on Internet Privacy. Will privacy be defined by compliance to regulation, or new software abstractions? Should the default be transparent data-usage, or privacy-by-default? Some patterns are already emerging from the space and they are shaping our lives online.
Just as code deployment and cloud infrastructure matured over the last 20 years, we are on the brink of huge advances for data privacy. Our toolkit is evolving and changing the dev stack in fundamental ways. Web stacks today commonly involve datastores (the model), backend software (the controller) and frontend code (the view), all powered by devops tools.
The Internet will change as software engineers gain more granular control over user data. But what might that change mean? We may see data logic be split from app logic in the backend, for instance.
What else? Looking through what work is happening in data management and online privacy yields interesting insights into what might be next for privacy tech.
Seeing competitive markets emerge we can see what incentives new regulation is creating (with the growth of consent management startups in the wake of GDPR, e.g.). Watching many startups chasing the same goal fail, we can distinguish consumers’ stated preference from their actual desires (see all the pivots in the self-sovereign identity space, e.g.).
The way these companies present themselves reflects the public narratives in the space. Startups’ positioning in the space is fed by and feeds the stories we tell ourselves about online privacy.
Having looked at the space, I work through some of the narratives.
- Open source is better for privacy.
Proving to users what you’re doing with their data is hard. The idea is that opening up part of your code base helps (you can charge for services or make a token to monetize). It’s an interesting idea (you might thing opening up a code base makes things less private) that reveals a couple things about the space. First, a lot of people working on privacy have a cryptography background (where security by secrecy is the kiss of death). Second, data privacy and data ownership are deeply tied. Sovereign ownership demands you see what is done to your data.
- Pitching security, compliance or productivity rather than privacy.
Many startups in the space pitch security, compliance or productivity gains derived from privacy-preserving architecture rather than privacy itself. Privacy is clearly an overloaded term that means too many different things to people. The exceptions to this rule are mostly in the consumer space and have pivoted toward doing DSR (Data Subject Request, or getting access to your own data) management.
- Volunteer networks are being replaced by incentivized networks.
Incentivized networks (i.e. blockchains, or decentralized networks with tokens) stand to replace a lot of the volunteer networks that were built in the 90s. The tech developed there is finding its way to a lot of (centralized) privacy products too.
- An emergence of specialization as the space modernizes.
With a bit of practice, it becomes easy to date privacy companies based on their pitch. The older the company, the harder it is to understand what they do from their website (also true of a lot of Web 3). We have privacy 1.0 consultancies (from the early 2000s), 2.0 cloud products (from the 2010s) and 3.0 APIs (from 2016 onward). As the space has matured and privacy software categories come into focus, companies have gotten more specialized.
- Blockchain cos. often fail to serve a specific customer.
Many projects in the distributed systems space have yet to fully specialize. The space’s relative youth and the intellectual siren call of “this should be a protocol, it must remain use-case agnostic and interoperable” means far more companies in web 3.0 are pitching a panacea of tools without targeting any specific use-case. Certainly protocols are not end-application dependent, but they should have a clear purpose. The blurring of the line between network protocol and end-user product in Web 3.0 is very real and has yet to be fully resolved.
- The Web 3.0 privacy pitch remains implicit.
Outside of a few products focused on privacy-preserving tech, the web 3.0 pitch for privacy remains vague. It’s mostly implied by architectural constraints: transparency forces the privacy question to the fore. But the pitch is about data ownership and control rather than privacy itself. It’ll be fascinating to watch this play out.
- Some established markets in the privacy space are emerging.
While some subcategories have yet to find their focus, others have already birthed very competitive markets. Identity Verification, Audit Automation and Consent Management all have more than a handful of well-funded ($20m +) startups competing. Data Anonymization is on its way.
- Conversely, some key markets have been a big letdown
A first vintage of Self-Sovereign Identity (or Decentralized Identity) startups have pivoted, leading to greater focus on Go-To-Market for remaining players. Likewise, user-controlled data marketplaces (“monetize your own data”) are mostly gone, since the economics make no sense (not to mention the shaky moral implications).
- Data sharing in regulated spaces could be a first mover for data marketplaces.
There are many interesting products around healthcare data sharing. I think it likely that better collaborative data tooling across companies (rather than across teams within a company) will emerge from industries in which data is scarce, the market for data providers is structurally fragmented and data sharing was hereto impossible (e.g. due to legislation) without modern privacy-preserving tooling and not just “difficult.”
- Privacy regulation has created interfaces between enterprise and consumer.
The regulatory landscape has shaped where the consumer and enterprise meet. Today, this means Data Subject Requests (for consumer)/Consent Management (for enterprise). It stands to reason that a lot of regtech will be user-facing.
- GDPR & CCPA lead the regulatory pack in terms of consumer protection.
While orgs are watching upcoming regulation like LGPD in Brazil, India’s PDP bill or the NY SHIELD Act, GDPR (and to a lesser extent CCPA) are becoming the yard-sticks by which companies set privacy standards and are likely to have a big impact on future UX.
- The Privacy vs Security debate has moved to the regulatory landscape, ouch.
The question of privacy vs security has been a mainstay cryptography debate for a very long time. It feels like it has expanded beyond questions of encryption, as privacy laws are bumping up against KYC, AML and CFT laws. Retain data or delete data. What’s a software project to do?
I saved my best for last:
- Privacy as software abstraction vs compliance.
The above question has emerged as a key organizing principle for the space. Is the end-goal here compliance or privacy-preserving data architecture? Is the real user the lawyer or the software engineer? The answers to this question will fundamentally shape the Internet and its relationship to data in the coming years.
I believe most long-term solutions must go beyond regulation and involve splitting out data-logic from app-logic to some extent. Regulation is key, but data engineering must go beyond simply complying with local regulation ad-hoc.
But what does that mean? I’ve found there are two schools of privacy tech practitioners: “transparent-data-use” vs “privacy-by-default”. The first believes transparency fulfills a product’s privacy needs, i.e., you should be told what is being done to your data and choose whether to proceed. The second believes transparency is a useless promise: “Is it better to be screwed if you know it?” Instead privacy-by-default (i.e. can’t collect the data in the first place) must be the path forward for the Internet.
Clearly, transparency is too low a bar to clear for online privacy. It seems a slippery slope and carries too many deep societal dangers.
- Should a user be expected to understand the data requirements of a given algorithm to know if the data requested is warranted? How informed can informed consent be?
- Further, transparency without choice is not much better than what we have today. Try not using any data-hungry service (e.g. Facebook or Google infrastructure) as you surf the web and see how far you get.
- Finally, can we be expected to understand the long-term implications of giving up data today? Like frogs in boiling water, we may wake up to find we’ve agreed to things we once thought outrageous (if that hasn’t already happened).
On the other hand, while “privacy-by-default” sounds nice, it can be hard to define precisely for any given product. Can it really an effective heuristic across all internet applications? What does it even mean in a given context? No PII collected? All collected data is anonymized? No data collected at all?
We’ve got some work to do. There is power in being deliberate about shaping defaults and this is the mission at hand: defining and building the new data and privacy standards for our digital lives.
Change is afoot. The Internet’s potential as a great means of expanding human knowledge and driving progress is playing out. So too is its potential as a means of mass surveillance and intellectual coercion. The extent to which these outcomes play out is up to us. The future of the Internet is ours to shape. Online privacy will be a key part of this evolution. I hope you’ll consider working in this space.