Private data, a possible built-in “GDPR compliant” solution for Hyperledger Fabric

The next release of Hyperledger Fabric v1.2 (just released) introduces private data. This feature can fix a bunch of confidentiality issues which are present with the technology today. One of those is GDPR compliancy. The following article requires some general Fabric knowledge. If you are not yet familiar with Hyperledger Fabric, I suggest you watch this video series or the introduction on the official documentation.

Quick note: I’m not a GDPR expert. The way you implement this technology is not guaranteed to be GDPR compliant. But private data enables GDPR compliant blockchain applications.

What’s that? SideDBs? Private Data on a blockchain?

The current way of implementing confidentiality is by using channels. It is discouraged to create a lot of channels for a large network just to achieve confidentiality. Creating channels for every transacting party brings a lot of overhead like managing policies, chaincode versioning, and Membership Service Providers. All the data would have to be either public or private. If you would want to transfer an asset to a party outside a channel, it would be a burden. This is where private transactions come in. Private data allows you to create collections of data using policies to define which parties in the channel can access the data. This access can simply be managed by adding policies to the collections. This allows for some data to be public and some to be private for some parties.

Current issue

Imagine the marbles example. You would like to save which marbles belong to whom. All marble data can be public except for the owner and price. These cannot be visible to anyone because of privacy reasons. Prices should not be made public for future transactions. Maybe you need to track this data because you need to validate whether the person selling the marble is the actual owner. A (fictional) marble auditing firm will be a partner in this to validate fraud. If you’re not using channels, in 1.1, everything you do will be recorded to state of the ledger. This is not GDPR compliant.

Hows does private data solve this?

Image 1: From slidedeck “Privacy Enabled Ledger” https://jira.hyperledger.org/browse/FAB-1151

The first set, “Channel Read-Write Sets” is what the current architecture looks like. Every transaction is recorded in the state and history.

The second set shows a shared private state between 2 peers, each in their separate organization. This state is replicated across these peers according to policies.

The 3rd set shows the true power of private transactions. The collections can be omitted from some members. This means you can set up separate private collections for each Marble seller — Marble auditor relation. These collections allow for some data to be added, while the main data is still stored in the main state and ledger.

Image 2: Private state https://hyperledger-fabric.readthedocs.io/en/release-1.2/private-data/private-data.html

Authorized peers will see the hash of the data on the main ledger, and the actual data in the private database. Unauthorized peers will not have the private database synced and will only be able to see the hash on the ledger. Since hashes are irreversible, he will not be able to see the actual data.

High-level, the issue resolved using private data looks like this.

Image 3: Marbles issue made GDPR compliant

How does this apply to GDPR?

My colleague Andries, made a clear article about the problems with GDPR and blockchain. I’ll describe the problem here in short but if you want to read the full article, please go here.

The problem

Data which has been added to the ledger, cannot be deleted. So when adding personal data, this is an issue for GDPR. One can not simply delete blocks. One solution which is used frequently is storing data off-chain like shown in the image below. But this solution is rather complex because you manually have to look up the validity of the data as well as the links to the data on the blockchain.

Private data as a solution

Private data is basically the solution above in Fabric itself without the extra work. It solves multiple issues with GDPR.

Limitation of data

You shouldn’t have access to data you’re not using

Private data solves this issue by not controlling access using policies similar to endorsement. By using this policy logic already present in fabric, we can use OR, AND,… operators to define which parties have access.

// collections_config.json

[
{
"name": "collectionFarmer-Store",
"policy": "OR('FarmerMSP.member', 'StoreMSP.member')",
"requiredPeerCount": 0,
"maxPeerCount": 3,
"blockToLive":1000000
}
]

Limitation of usage

You should only keep your data as long as you need

For collections, you can specify a blockToLive in the policy. This does exactly what it sounds like. You can define how long a collection should be kept in terms of blocks. This means, old data in the private database will automatically be purged after x amount of blocks and you do not have to worry about having unused data. The hashes in the actual blocks will not be removed.

// collections_config.json

[
{
"name": "collectionFarmer-Store",
"policy": "OR('FarmerMSP.member', 'StoreMSP.member')",
"requiredPeerCount": 0,
"maxPeerCount": 3,
"blockToLive":1000000
}
]

Right to be forgotten

This is the same as the previous item, but items can manually be removed. Since nothing is written to the ledger, except for the hash, after this procedure, this item will not exist anywhere.


Caveats

This solution is only GDPR compliant when

  1. Parties are not malicious

If they have bad intentions, they can just copy and share this data with external parties. This is a general issue and not specific to blockchain technology. This is where the rules in your consortium come in. You need to have clear rules with clear consequences defined to make sure nodes do not get malicious.

2. When it’s implemented correctly

Like mentioned at the top of this article. It’s GDPR compliant if you write it correctly. You have to be cautious of what you place on the public ledger and what on the private and how long you will keep this data.

It’s not bulletproof just yet

Your chaincode will be replicated across all peers. And so will the other configuration files. This means the collections_config.json will also be replicated to all peers in order for the system to properly setup and know about these private collections. This means every member can see who’s doing business or sharing secret data with who. They can’t see the actual data but disclosing the participant’s information is still a confidentiality issue. This issue should be addressed in 1.3.

Collections have to be defined up front

Currently, private collections have to be defined up front. This is hard to maintain when there is a large amount of different party-party transaction. But it’s usable. Version 1.3 will introduce implicit collections which are basically collections which can be made on the fly and even passed on to other members.