Understanding digital privacy today.
This is the first half of part 4 of 4 of a series of musings on the topic of online privacy. I don’t pretend to resolve the problem, simply exploring facets of the space and pulling at strings that may make the web a more wholesome place to explore and help builders think about the moral valence of their technical decisions. View part 3, second half of 4.
TL;DR — Changes in regulation, user sentiment and the tools available to developers has led to the creation of new data privacy tooling. From decentralized systems, to data loss prevention, as well as identity management systems and compliance tooling, there are many new options for creators online. This post seeks to make sense of the space.
Creators on the web are being forced to rethink their data practices. The time is ripe for new privacy standards to emerge, pushed by the creation of better data tooling for creators.
Software is eating the world, and data is eating software. More “real-world” data is coming online than ever, all while more “digitally-native” data is being generated still. There is no escaping digital risk in our analog lives.
Yet the complexity of data management and opacity of outcomes means data privacy doesn’t make it into the top 3 things developers and creators get to care about. Privacy choices feel like all-or-nothing, collect data or don’t, with no in-between.
Given deep market changes across multiple realms,
- Technical — there are new tools to make privacy decisions easier, including those emerging from decentralized tech. Technical constraints are no longer an excuse to harming your consumer
- Regulatory — much needed (if imperfect) regulatory action (e.g. GDPR, CCPA, etc.) is appearing across jurisdictions and changing the data risk-calculus for businesses
- Demand — consumers are becoming savvier and demanding more protection from the products they use
many new products are being built to help creators navigate data decisions and to give consumers power over their own data.
I wanted to get a sense of what this means in practice and what we might expect to see in privacy tech in the coming decades. Accordingly, I made a watch-list of 250+ projects in the privacy space in order to make a sensible topography of modern privacy tech.
Obviously the line between privacy, security and web 3.0 products can be blurry. But I focused on privacy in this search: how data is collected, secured and communicated. I focused on companies tackling user data management directly or whose work entails a fundamental change in how data is handled online.
I compiled insights from this research in a separate post: understanding digital privacy tomorrow, check it out!
The Map of the Cat
I’ll go through the categories and subcategories I’ve found below. They span the worlds of
- Consumer products,
- Enterprise products,
- Developer tooling.
These categories are imperfect. Some projects could fit into multiple or don’t neatly fit into one. Nevertheless, I worked to assign each project a single category/subcategory in order to really articulate their core offering.
The categories are:
- Decentralized Systems — peer-to-peer systems which are driving innovation in how private data is handled.
- Personal Privacy Tooling —tools built to helps consumers reclaim control over their data online.
- Privacy-Preserving Apps — built with privacy-by-default in mind to collect as little of your data as possible.
- Identity Management — solutions to help manage your identity online (for consumers) and understand and verify who your users are (for enterprise).
- GRC (Governance, Risk, Compliance) —RegulationTech (or RegTech) at large helping companies stay compliant.
- DLP (Data Loss Prevention) — Data Management and control systems to help projects anonymize or secure captive user data.
- Collaborative Data Tooling —Data sharing and anonymization tools seeking to break down data silos.
I split these categories into subcategories below and include some companies whose work I find particularly interesting or illustrative of the space.
🟣 Decentralized Systems
This category encompasses peer-to-peer technologies in general. These projects power decentralized/distributed networks. These are networks in which all participant computers are running a protocol as equals (peers), rather than depending on a single source of truth to serve up data (what we’re used to on the Web).
This decentralized Web is not about privacy but about verifiability. For instance, blockchain products ensure any peer can verify the validity of data (since it is publicly recorded on-chain) rather than taking it on faith (e.g. as you do assuming that your Facebook wall reflects your network’s posts accurately).
Given this emphasis on verifiability, decentralized systems tend to be very transparent/leaky in terms of data. Information is broadcast across the network rather than given to a trusted gatekeeper. Accordingly, data privacy is a central issue for decentralized networks. A lot of work in the field is changing how we think about data handling, user consent and data control.
I split the space into:
- Protocol Development, the basic protocols running incentivized distributed networks (i.e. layer 1 blockchains). These have a number of use-cases including verifiable computing, storage, currency. Bitcoin is a good example.
- Layer 2, is the set of protocols that make blockchains work better. For instance, these protocols (which run on top of layer 1 protocols) help networks scale, be more private, support more complex transactions.
- p2p infrastructure companies are building infrastructure to help run peer-to-peer networks. They include enterprise node companies (like Bison Trails or Alchemy), or open-source networking libraries (like libp2p), etc.
- Volunteer Networks, these are the OG p2p networks like Tor or Bittorrent which are powered by volunteer nodes (rather than incentivized by tokens).
- Primitive Development designates the bulk of projects working toward the development of key cryptographic primitives for the space. This is work on Zero-Knowledge Proofs, Multiparty Computation techniques, differential privacy, etc.
🟢 Personal Privacy Tooling
This category encompasses tooling built to help you (the consumer) manage how your data is used online and make your online experiences safer.
The first two categories are about leak prevention (making sure your data is not shared unwillingly), the latter two about remediation (what to do when it is).
- Access Control tools are all about personal data loss prevention by securing access to your digital world. Think password managers (e.g. 1Password, Dashlane) , 2FA providers (e.g. Authy, Duo), etc.
- Masking services help you stay anonymous as you surf the web. They anonymize or mask your PII. They include VPNs (Nord, ExpressVPN, Algo) and adblockers (Ghostery, PrivacyBadger), as well as temporary email (Firefox Relay) or temporary credit card issuers (Privacy.com).
- Privacy Retrofit products help secure legacy systems on the Web. They include SEO management tooling, encryption tooling for third-party protocols and services (encrypt your data on Dropbox, encrypt your Facebook posts, encrypt your emails, etc.), or tools to help you manage your privacy settings across services (e.g. Jumbo Privacy).
- Data Watchdogs keep watch on your behalf. They track your online services, interact with them for you and watch the Internet for threats to your data. You can look at haveibeenpwned here, as well as Google alerts, and DSR (Data Subject Request) tools like Mine.
🔵 Privacy-Preserving Apps
These products have made privacy a core part of their value prop. Most of these products end up competing in well-established software segments, offering you feature parity (hopefully) with more data dignity. This can mean different things in practice. Some products avoid selling your data, others avoid collecting your data or generating data about you in the first place (“privacy-by-default”). Others simply offer an on-prem option or hardware platform you control.
There are two main subcategories here: apps built in Web 3.0 and those built on the traditional Internet (Web 2.0). Today, they differ in the stacks they sit on and how they monetize. It seems clear that this distinction will blur as the space matures. Products will use a mix of centralized and decentralized infrastructure and data sovereignty (see below) will become more common.
- Privacy-by-default Alternatives are the Web 2.0 privacy-first products. They are browsers (Brave), email clients (Hey, Helm, ProtonMail, etc.), search engines (DuckDuckGo), messengers (Signal, Mattermost), OSes (Tails), etc...
- Dapps are the Web 3.0 tools. Privacy is not a first-order concern for many, but they have to address it given Web 3’s transparent-by-default nature.
🟡 Identity Management
One of the areas in which a lot of privacy tech is emerging is Identity Management. At a high-level, these solutions are unbundling a lot of what credit bureaus have been doing for decades: helping companies make sense of who their customers are.
Solutions in this space span enterprise and consumer tooling, with some very traditional SaaS tooling out there, as well as fully decentralized protocols. Shout out to TrueWork (one of my favorite products in the space) and others taking direct aim at credit bureaus by bringing people back into the fold of their own identity checks.
I split the space as follows:
- ID Verification solutions help companies verify their users’ (or other orgs’) identities and personal data. This is for regulatory purposes (e.g. KYC/AML compliance) and in order to provide better services (e.g. by verifying employment or salary information for financial tools).
- Insights and Management solutions sit downstream from the verification tooling. They help orgs manage internal risk, combat fraud, weed out bot-generated content or fake reviews, etc. These orgs often also do ID Verification as well. Note that this subcategory has heavy overlap with the Compliance and Audit Automation one below.
- Sovereign ID (also called self-sovereign identity or decentralized identity) solutions seek to give consumers control over how their personal data gets shared (i.e. give them back their sovereignty). The idea is that consumers should have precise control over who gets access to their personal data, for how long, etc. giving the web a standard with which to become private-by-default.
🟠 GRC (Governance, Risk, Compliance)
Governance, Risk Management and Compliance (GRC) software has historically (90’s) been boring all-in-one SaaS that bundled together some risk/compliance consulting, with training and a few palsy software tools to track risk vectors.
GRC includes all tooling that directly helps companies deal with regulation, certifications and data risk at large. Think of it as “RegTech.”
The categories below step in at various ends of a potential breach. They cover prevention (through audit automation), data management (consent management) as well as what happens when shit hits the fan (watchdogs and insurance).
- Compliance and Audit Automation firms are helping orgs get through certification processes, like SOC2 or HIPAA (e.g. Vanta, Drata, etc.), as well as manage their compliance (e.g. Clausematch, or a favorite of mine, Aptible).
- Risk Consultancy firms are mostly legacy GRC companies (like BlueUmbrella) doing consulting, risk assessment. They offer some tooling to help automate common compliance workflows.
- Consent Management solutions fill the space created by GDPR, CCPA et al. by helping companies get user consent (e.g. for cookies: OneTrust, TrustArc, etc.) and helping them serve DSRs (e.g. Transcend, Ethyca). Many of these tools interface with the end-user directly.
- Data Watchdogs, like their consumer counterpart, help track orgs’ digital footprint online. They help track data or content generated within a given product or about a given organization (e.g. Cyabra, Spectrum Labs)…
- Cyber Insurance companies (e.g. At-Bay) offer orgs financial protection in the case of data breaches and helps them adopt cybersecurity best practices.
🔴 DLP (Data Loss Prevention)
Data Loss Prevention aims to prevent captive data from being misused or stolen. While DLP is typically a part of cybersecurity, it often deals with data privacy directly.
These projects’ approaches to DLP vary. Some help companies not collect data in the first place (privacy-by-default tooling) or control access to data for those within and outside of their network (Access Control, Captive Data Management), for instance.
- Privacy-by-default Alternatives in the enterprise market include productivity suites that offer some element of on-premise or (more interestingly) are built private-by-default (e.g. Keybase, Nightwatch).
- Access Control is a large segment. It covers user access-control (Auth0, Keyless), resource-based access control (Symops), VPN alternatives (Twingate, Strongdm), fleet management (MobileIron, Jamf), and more.
- Captive Data Handling tools help creators protect the data they store. They can be split between data protection tools (which protect sensitive data internally like, notably, Evervault), and data discovery tools (which find PII in your stack, like Gamma or Nightfall).
- Endpoint Management tooling does DLP by looking at data at the edges of your system, like the third-party tools you use (SaasOps tooling like Cyral), your APIs, or the emails you send (e.g. Material Security)...
⚪ Collaborative Data Tooling
A big segment of privacy tech is collaborative data tooling. The premise here is that breaking data silos thanks to better privacy primitives will make data more valuable. These tools help teams within an org (or across orgs) collaborate on shared datasets.
This category has meaningful intersections with Sovereign Data solutions (in ID management) above as well as parts of captive data handling (in DLP). Tools here use a lot of the same primitives that are in use across distributed networks.
I break out the following subcategories:
- Data Marketplaces create opt-in, open data exchanges. Products in this space include regulated data exchanges (and tooling), personal data exchanges which enable users to monetize their own data, and even products like Delphia, a robo-advisor that drives investment decisions with its funders’ data.
- Data Transformation tools ensure data can be shared safely across teams without putting user privacy at risk. This is done with various data anonymization and tokenization techniques. Gretel is a great one to check out here!
- Privacy-Preserving ML tools enable machine learning models to be trained without leaking the underlying data. This is done, for instance, using differential privacy (which injects noise into data sets to anonymize data) or federated learning (which trains models at the edge, on your device). Orgs working in this space include Cape Privacy, OpenMined, or DataFleets.
- Access Control tools here enable more robust multi-tenancy databases. There are really fascinating solutions here. They include access control policy/governance tooling (like Oso, most notably) as well as open-by-default databases.
There you have it.
As you can see, there is now a ream of creators trying to tackle this gaping hole in the product-design space: privacy. It’s an exciting time to be online!
Over the few months I’ve been doing this research I’ve noticed some organizations in my watch-list switch focus, startups change their pitch, etc. The privacy tech space is moving extremely fast and while a lot of the market dynamics are still in the air, some patterns are starting to emerge.
As a complement to this topography, I’ve written up the most potent patterns I’ve noticed going through the space. Check them out. I believe they hold some keys to understanding online privacy tomorrow.