Privacy Preserving Identity System for Ethereum dApps

Michael Sena
uPort
Published in
10 min readApr 26, 2018

--

uPort makes it easy for Ethereum developers to offer improved privacy to their users.

[image]

The attributes that make blockchain best for identity also make it worst for identity. The market is becoming flooded with “simple” identity options that crush the promise of user privacy on Ethereum.

At a time now more than ever, with privacy violating events dominating headlines around the world and GDPR coming into effect this month, developers should be conscious about how they are handling user’s data.

Developers should choose an identity provider that is simple, but one that values user privacy, and actively works to preserve this system value. Privacy-preserving systems like this require years of design and testing. uPort is proud to share its approach to user privacy on Ethereum here.

Warning! The Blockchain is Forever

[image]

The blockchain is public and permanent. While these are two generally desirable properties of blockchains, they provide a significant challenge to building a privacy-preserving user identity management system. We must ensure our system is not only secure and private today, but also decades into the future. Users cannot reasonably be expected to know best practices for how to manage their data, or what it means to store their information on the blockchain, so it’s up to us to protect them.

Modern decentralized identity systems utilize a credentials-based model of identity. This model expresses identity attributes as a collection of individual data credentials. Credentials can be used to cryptographically express things such as name, birthdate, membership, reputation, and even proof of being human. Thus the fundamental purpose of any decentralized identity provider is to handle the world’s most personal data.

When considering between various open source identity standards on Ethereum, developers should consider user privacy amongst their non-negotiable requirements. We consider it the most important facet of a self sovereign identity system, and a topic we have been thinking about for some time now.

Deciding between identity solutions? Ask two privacy questions.

  1. Does it offer off-chain, in addition to on-chain, user data storage options?
  2. Does it minimize correlation risk for my users?
[image]

Ethereum Risk #1: On-Chain Data is a Permanent Target

You have to assume that blockchain identity systems that store users’ personal identity credentials on-chain are going to be targeted by malicious actors today and into the future. Yes, this is still a concern even if you encrypt the data and then store it on-chain. We think it’s a reasonable expectation to assume that computing power will increase to the point of being able to crack modern popular cryptography within a window of provider liability. This means that every piece of user data ever stored on-chain, will be publicly exposed for the world to see and act on. If this occurs, it could be disastrous for businesses and applications that promoted this negligent pattern.

Malicious actors with supercomputers will stripmine the blockchain for users’ PII. [image]

Ethereum Risk #2: On-Chain Actions are Correlatable

Public blockchain ledgers are available for all to read and analyze. This makes blockchains a very easy database for static analysis. The rise of machine learning and supercomputers have made it trivial to draw robust conclusions about the identity of an individual by correlating a few simple pieces of data that can be attributed to a common identity. Malicious actors can easily track your public data and public actions back to a common identity.

Correlation in identity systems occurs by tracking the actions or events taken by a single identity across the network, by tracking the publicly available data about that identity found on the network, and by looking to draw strong links between this identity and other identities on the network. This analysis technique can be used to estimate a user’s identity with a very high probability.

[image]

It is important to understand that correlation extends beyond simple user data. Instead, we should consider ways to minimize the correlation of a user’s on-chain smart contract interactions between different dapps, since looking at activity is a very easy way to correlate the identity of an individual.

To combat this very difficult problem inherent in public ledger-based decentralized systems, we need to design identity systems that reduce the number of data points that can be connected to each other through deep analysis of the public ledger.

To highlight this problem, let’s look at a simple example:

Alice creates a MetaMask account and funds it with ETH. She logs into a prediction market and places a bet on a market. Then, she needs to use a government dapp to vote in a local political election with her blockchain identity. She logs in with her MetaMask and casts her vote.

This creates an immediate problem for Alice. Unbeknownst to her, prediction markets are illegal in her country. Ignorant of the risks associated with using prediction markets, and lacking a proper understanding of blockchain technology, Alice has unknowingly exposed herself to the authorities.

Because Alice voted on an illegal prediction market, and then used that same identity in her local election, she is exposed because her actions are extremely simple to correlate across these two dapps.

This point about simple correlation makes it extremely difficult to design blockchain-based systems that give users simple control over their identity, but that also protect their privacy and respect their right to be forgotten. Identity systems should strive to preserve user privacy, and by extension, combat correlation. Oh, and they need to be simple.

[image]

uPort is a Privacy-Preserving Identity System for Ethereum dApps

Principle #1: Store User Data Off-Chain to Combat Permanence

Except storing data “off-chain” isn’t immediately clear or simple to developers. The question becomes, where do you store user data if not on a server or a blockchain?

uPort minimizes a user’s digital footprint.

How is off-chain user data stored?

Off-chain data is stored in a user-managed vault, which can be hosted locally on a smartphone, on a private identity hub, or both. A private identity hub is a secure, self sovereign hosted cloud agent that can store user’s personal data — kind of like a docker for identity. In the current version of uPort, user data is stored locally within the uPort mobile app, which the user can use to authenticate to dApps.

How is off-chain user data shared between applications?

Because private data is stored locally on the user’s uPort app, applications cannot simply read the public blockchain to discover information about an identity. Instead, they must ask for a user’s private information directly.

The uPort Wallet application provides a simple consent interface for dapps to request private data from users, and users to approve or reject this request. We call this interface a Selective Disclosure Request, and it gives users complete control over their identity data.

How is off-chain user data backed up?

uPort offers a private user data backup hub, called Caleuche. Caleuche offers uPort Wallet users the ability to store symmetrically encrypted copies of their private data on a server. We can never read the data stored on this server, and don’t store extra copies of it.

As always, users have the clear ability to opt out of this backup service. Opting out of private data backup jeopardizes the user’s identity data since losing their smartphone also means losing their identity data, since there’s no backup. But that’s ultimately the user’s decision to make. We try to not be prescriptive as that would go against our core values of providing empowering users with choice and control.

We are also developing a solution that will allow users to run their own Caleuche fro maximum privacy.

Principle #2: Create New User Accounts for Each dApp

While it’s easy to correlate the actions of a single identity across multiple dapps, but it’s much harder to correlate the actions of multiple identities across multiple dapps. To further reduce users’ footprints beyond just moving identity data off-chain, identity systems should promote better key management standards for users. One simple improvement is for identity systems and wallets to promote the use of application-specific accounts.

[image]

Application-specific accounts are great in concept. They combat the problem of identity correlation head on by reducing a user’s identity exposure to their pairwise history with each individual application. This multi-identity architecture makes it much more difficult to track a single user across the applications they use just by analyzing the blockchain.

Application specific accounts can be implemented by individual users today– by creating and using a new account for each app they use via MetaMask or whatever other wallet they use for interacting with dapps. However, users cannot be expected to manage their accounts in this way at scale. In fact, we are already witnessing this pattern fail. Even current technical Ethereum users, overwhelmed with cognitive overload and busy lives, exhibit bad behavior. Most people only use a few accounts for all dapps, and we can expect this behavior to only worsen as Ethereum and other blockchain technologies scale to reach a broader audience.

This highlights the main challenge with application-specific accounts–usability. uPort Wallet’s Smart Authentication feature solves this challenge by removing pairwise account management usability as a concern. Smart Authentication allows users to “just log in” to their favorite apps without needing to worry about which account they’re using. Instead of users needing to preselect the account they wish to use in their wallet before logging in, uPort embeds the account request right into the authentication process. This UX improvement completely removes user’s concerns from account management.

But the uPort platform is designed to be completely flexible and doesn’t force application-specific behavior on developers or users. Developers can require that users log-in with a keypair account specific to their application, or developers can give users the choice. This feature is called Segregated Accounts, and can be configured in the uPort Connect library.

Principle #3: If you MUST Store Data On-Chain, Use Extreme Judgement

We’re not saying there is not a use case for on-chain data. We believe that fun stickers, badges, flair, collectables, and other non-PII exposing use cases are all great uses for on-chain data. You may also consider issuing an on-chain credential if you need it to be verified by another smart contract. But again, consider the implications of storing data on-chain and ask yourself if it is absolutely necessary.

User badges may be a great use case for on-chain data. [image]

Where should I store my users’ on-chain data?

You should store any on-chain identity information (credentials) for your users in an Ethereum Claims Registry (ERC780). The Ethereum Claims Registry is an open source smart contract that allows identities to make claims about other people or things. There is an official uPort Registry ERC780 deployed to every major Ethereum network that is available for public use. Alternatively, developers can pull our contract code from the uPort Identity Github repo. We submitted this contract for acceptance as an Ethereum standard. You can find it here.

What’s the difference between ERC780 and ERC725?

We receive many requests to contrast the Ethereum Claims Registry (ERC780) put forth by uPort, with a “competing” standard (ERC725) identity contract put forth by Fabian Vogelsteller.

The Ethereum Claims Registry ERC780 is an important on-chain component of the overall uPort Identity System — it’s a single contract where all on-chain user data is stored. More specifically, it’s a singleton contract that allows any identity to make or read claims about any identity on the network. This public identity data centralization makes it extremely efficient to lookup things about users.

ERC725, on the other hand, combines identity and registry components into one contract, that represents a user identity. ERC725 only supports on-chain data, whereas uPort and ERC780 also support off-chain data. We believe ERC725 takes an inappropriate approach to standards development because it combines multiple concepts into one layer. Web standards typically develop in a modular layered way, so we believe our approach is a safer bet. TL;DR: ERC725 is an inefficient and rigid, privacy-exposing way to implement an identity system on Ethereum.

You can find more information comparing ERC780 and ERC725 in this previous blog post by our Technical Lead, Pelle Braendgaard.

Does uPort support user data NFTs? (ERC721 non-fungible tokens)

ERC721 has recently emerged as a way to implement on-chain user badges, and other on-chain data. We are working to add support for NFTs in our platform. If you have a particular use case in mind, we’d love to hear about it. Get in touch with our team by sending an email to community@uport.me.

Support Our Open Source Identity Efforts!

Comment and upvote our Ethereum standards proposals!

Consider user privacy a requirement when deciding which Ethereum identity system to implement.

Read and comment on our proposal for an Ethereum Claims Registry (ERC780).

Wait, there’s more identity goodness…

Join our Riot community chat!

All kinds welcome :) [image]

uPort is a self sovereign identity, data, and authentication system built on Ethereum and IPFS. To learn more, visit our site at www.uport.me. Follow us on Twitter, Medium, and Github.

Visit the uPort Developer portal

Download uPort Wallet on iOS and Android.

--

--

Michael Sena
uPort

Breaking down digital silos @ceramic. Helping developers #BuildBetter apps @3box. Happy contributor to the decentralized web. Fair data advocate.