Deep Dive into Private Data in Hyperledger Fabric


Private data is most talked features in fabric, released in v1.2. It brings the concept of data privacy among multiple participants without creating a separate channel for each other. As per fabric Docs, Private Data is

In cases where a group of organizations on a channel need to keep data private from other organizations on that channel, they have the option to create a new channel comprising just the organizations who need access to the data. However, creating separate channels in each of these cases creates additional administrative overhead (maintaining chaincode versions, policies, MSPs, etc), and doesn’t allow for use cases in which you want all channel participants to see a transaction while keeping a portion of the data private.

Why Use Private Data Feature?

In real time scenarios, almost all participating entities share sensitive data (price, personal Information etc) to each other, but sensitive data need to be shared among everybody. In the Earlier version of fabric < v1.2, There has to be additional business logic written on SDK side to not reveal sensitive data to all the participants by restricting access or create a separate channel for each participant which is not a feasible solution. So to avoid these scenarios private data has been introduced.

What Private Data consist of?

Private Data structure

Private data has collections which basically represent private data DB with access to particular member define in policy. Each collection is installed and instantiate on authorize peer which internally call chaincode function to execute get, put, delete functionality of private data.

Fabric Architecture with Private Data :

Reference Architecture from Hands-On Blockchain with Hyperledger by Nitin Gaur

Each Peer has transient storage (temporary storage) through which private get propagated to other authorize peers using a gossip protocol. Private Data is permanently stored in private state DB (couch DB) inside peers.

The flow of Private Data :

1: Send Proposal along with private Data in the transient field
2: Simulate Transaction and return response ( public data + hash of private Data)
2.1: Private Data stored in temporary transient storage and sent to another Peer using a gossip protocol
3: Client assembles endorsements into a transaction and sends to the orderer
4: Created a block and send to Committing Peer
5: Validate the block by comparing the hash of private data received from another peer which is part of a collection with a hash of private data in the block sent by the orderer

Private data storage in Peer :

Private data stored in couch DB in a Peer

Authorize Peer only have private data as an instance of couch db configured in peer container and other unauthorize peer will have a hash of data. Hash is stored in other peers because in case if private data has to move from org1 to org2, the authenticity of private data will be computed by the hash of private data sent by org1 and hash of private data stored in org2.

FAQ related to Private data?

Q: Is Private data follow GDPR Compliance?

Ans: As per my understanding private data cover only certain aspect of GDPR. for e.g Right to be forgotten, You should only keep your data as long as you need, this compliance can be easily achieved by configuring Block to live policy or explicitly calling delPrivateData function. But if some of org who are part of collection become malicious they can easily distribute customer PII to any other parties. Currently, there is no such mechanism of tracking of sharing private data in fabric.

Q: Can we upgrade Collection policy?

Ans: If a collection is referenced by a chaincode, the chaincode will use the prior collection definition unless a new collection definition is specified at upgrade time. If a collection configuration is specified during the upgrade, a definition for each of the existing collections must be included, and you can add new collection definitions.

Collection updates become effective when a peer commits the block that contains the chaincode upgrade transaction. Note that collections cannot be deleted, as there may be prior private data hashes on the channel’s blockchain that cannot be removed.

Note: You can’t upgrade Block to Live property of existing collection because peers may be at different heights and they require a deterministic block to live when processing a block.

Q : Where do privateData Storage and transient storage located inside peer container?

Ans : Transient Storage : /var/hyperledger/production/transientStore

Private Data Storage : var/hyperledger/production/ledgersData/pvtdataStore

Ledger : var/hyperledger/production/ledgersData/chains/chains/mychannel inside blockfile_000000 is txLog file having all ledger data.

Q: Is it possible to define collections at run-time?

Ans: collections must be defined statically. The ability to send private data to parties dynamically is on the roadmap for 2019.

Q : Where i can see collection hash of private data in a block?

Ans: Fetch a block from the channel and then using configtxlator tool convert block file into JSON. Detail implementation can be found here

Hash of Private data stored in the block file.

Conclusion: Private data is an emerging concept and lots of Research is going on, especially sharing private data using Zero knowledge proof, making private data more GDPR Compliance. I hope this article has given deep insight into private data.

Cheers!!

Reference :