Privacy and Web3: Volume II, Part 1
Nearly two years ago, in February of 2018, I wrote my first post on the importance of privacy in the Web3 space, specifically touching on how Web3 browsers could leak information — potentially very sensitive information — to every website that you visited. About nine months later, EIP-1102 came into effect, making fundamental changes to Web3 clients that avoid exposing confidential information without the users’ express consent. That was a step in the right direction, but there’s still a lot of reason to be concerned about Privacy in the Web3 space.
This is the first part in a series of indefinite length about why you should care about Privacy in Web3, what you should worry about, and how you can protect yourself.
Data Privacy — The Legacy of Web 2.0
Without question, the web has been revolutionary. It has changed how we shop, how we invest, how we access news and entertainment, how we develop code, how we interact with friends, family, and colleagues, how we find people with similar interests and even people to share our lives with. In many of these cases, we, as users of these online communities pay nothing to the companies that run the services we rely on. Yet many of these services that we use every day for free make billions in revenue. How? As the old adage goes, if you’re not paying for a service, you’re not the customer, you’re the product.
As most Web3 advocates are well aware, the Web 2.0 revolution was fueled by the discovery that users’ data is valuable. In the more innocuous cases, this data is used to market to us, selling us things we didn’t know we wanted. On the more pernicious end of the spectrum this data is used to compromise the foundations of democracy.
The Promise of Web3
Web3 was supposed to change this, and had several ways it was going to do it.
First, it enables services to become decentralized. Rather than depending on one company to run a service, the data and rules for how that service operates live on the blockchain, with smart contracts governing their rules, algorithms making sure nobody can cheat, and no central party to collect and sell your information.
Second, largely out of necessity, Web3 builds in its own economic system. The need for ETH and other tokens to interact with Web3 systems means that users already have a convenient way to pay for the services they use. It’s not like Web 2.0 systems where you can either ask your users for a credit card or depend on advertising — they can pay for the service seamlessly as they interact with the block chain.
The Unspoken Realities of Web3
For Web3 to be truly decentralized, you really need every user to run their own node — or at least a light client. They sync with the block chain, then use fancy math to verify all the data they get from peers. Unfortunately, running your own node is hard, especially at the scale of a big application, so lots of people skip this step and outsource node infrastructure.
If you’re readying this, you’re probably already aware that Ethereum transactions are public information. You can use a block explorer like Etherscan to see all the transactions on the blockchain, what addresses have sent and received different tokens, what addresses are playing CryptoKitties , etc. But that information as all pseudonymous, and it’s generally quite difficult to tie a given address to a particular identity. And only the transactions you submit are public information — you can browse the block chain to your hearts content without sharing any personal information. Queries are private. Right?
If you’re using third party node infrastructure, it’s not that simple. Every time you look up a token balance, the node infrastructure can see what IP address was looking up what account’s balance of which token. If you’re browsing a DEX, the node operators serving you information about the blockchain can potentially see what token pairs you’re interested in. They can figure out what Ethereum addresses are associated with your IP address. And since there are a lot fewer public Ethereum gateways than there are dApps, these node operators can track your identity across several dApps.
What is Your Data Worth?
So Ethereum gateways can see not only your transaction but also your private queries. What could they do with that?
First, there’s the host of privacy violations we’re used to from Web 2.0 — tracking your activity across the web, building a profile about you, and selling that profile to advertisers. That’s enough to be worried about, but that’s the tip of the iceberg in Web3. They’ll also have access to a lot of financial information — how much ETH you hold, what other tokens you transact in, what DEX you use, etc. And they’ll be able to link queries by IP address, so even if you use different addresses for different activities, they’ll be able to link all of those together. By tracking the source of your ETH and your tokens, they can probably figure out what centralized exchanges you use, maybe even who you work for if you’re lucky enough to get paid in ETH or tokens.
But there’s another concern that could have a more direct financial impact. Decentralized traders already have cause to be concerned about front running, where traders watch the mempool to find orders people intend to execute and pay a higher gas fee to have their order filled first. This can allow front runners to profit at the expense of regular traders. Anyone can watch the mempool to front run, but node infrastructure providers could also base their decisions on the information people look for before they submit a transaction. For example, if a trader runs a query to validate that a specific order is still available, the node provider could front run the order before they’ve even submitted a transaction.
Taking this risk a step further, a nefarious node infrastructure provider could sell a real-time stream of users’ private requests. A well financed trader could stay one step ahead of the the competition by paying for a subscription to the private requests of established traders.
Where Rivet Comes In
The safest way to protect yourself from these risks is to run your own full node. Unfortunately, running your own nodes have several pitfalls. Ethereum nodes have considerable system requirements — big, fast disks; considerable amounts of RAM; and beefy CPUs. When Ethereum nodes fail, they can take hours to recover even from daily backups, and days to recover from scratch. Load balancing across multiple Ethereum nodes can help prevent downtime, but it introduces inconsistencies as different requests go to different nodes that may be at different stages of validating the latest block.
At OpenRelay, we started the EtherCattle Initiative to tackle the operational challenges of running an Ethereum node. It’s an open source project that runs a replicated cluster of Ethereum nodes to provide high availability without compromising on consistency. We solved a lot of the hard problems in running Ethereum nodes, but there’s a cost. Running a minimal high availability Ether Cattle Cluster runs about $500 / month, which is a big commitment for a lot of small projects, to say nothing of the maintenance effort involved. Scaling up is quick and easy, but the base level of commitment is substantial.
We were proud of the solution that we built, but disappointed we couldn’t bring it in at a lower price point. We decided that the best way to leverage what we built was to launch Rivet, a reliable node operator with a commitment to the privacy of our customers and their users. We codified our commitment to privacy in our strongly worded privacy policy, where we assume legal liability if we ever sold our users’ data to a third party (and we don’t even record any information not critical to the service we provide). We will continue to support and maintain the open source EtherCattle Initiative so that sizable projects that need reliable infrastructure can do host it themselves and avoid sending private information to third parties — including us. On our roadmap we’re exploring several ways we could expand Rivet’s offering to provide our users with technically grounded privacy guarantees, rather than just contractual guarantees.
What’s Next
In the coming weeks you can expect to hear a lot more from us about privacy. We’ll talk about how our Privacy Policy and Terms of Service are structured to protect you. We’ll go into detail about steps you can take to protect your data. We’ll also present some protocol level changes that could give you a greater ability to protect yourself.
Your privacy is important. Stay tuned.