What is Data Privacy on a Private Blockchain, and Why Do We Need It?
Waves Enterprise builds on and adapts existing solutions for private data storage and transmission on a private blockchain, optimizing the benefits for our target market.
When we talk about blockchain the very first association that comes to mind is Bitcoin. Bitcoin is a public blockchain, which means that anybody can access the full transaction history of the system, and anybody can set up their own full node and mine, which provides greater decentralization for the system and better security guarantees.
However, there are many companies that might benefit from using blockchain in some of their business processes, but they are not happy with the number of “unwanted” parties involved. In addition to this, we can say that the participants of a private blockchain might have different roles in the system: miners, maintainers (or just regular participants), administrators who allocate permissions to the other parties, etc. With this in mind we can better understand what a private blockchain actually is.
This article is about private data on a private blockchain. Let’s say we have a private blockchain maintained by 10 companies, and three of them want to make a deal in such a way that the blockchain contains certain information about the deal, but this information is either concealed or meaningless to the other parties. Because the information they exchange is so sensitive we cannot allow it to be read by the other private network participants even if it is encrypted.
We can formulate the definition of the problem we seek to solve as follows — the specificity of business applications on the blockchain and requirements to ensure their information security needs mechanics to operate with confidential data among strictly specified parties. We can say that specified parties are only a subset of all the parties in the private network, which means that we need additional methods to protect sensitive information.
The rest of our article is organized as follows:
- Classification of existing solutions — we will discuss major existing solutions for our defined problem. This will be necessarily brief, since providing a full overview would require a separate article for each of them. Also, for each type of solution, we will list the pros and cons.
- In ‘Waves Enterprise solution architecture’, we will outline what lessons we have learned from other solutions, the experience we gained deploying enterprise private blockchain solutions, provide details about how Waves Enterprise was built, and discuss its pros and cons.
- Finally, we will give an overview of Waves Enterprise’s future development plans.
Classification of existing solutions
In order to build a successful and complete solution, we investigated the previous work in this area and came up with a classification of solutions which provides a way to deal with private data.
- Private database outside the node (but somewhat integrated). Data is distributed off the blockchain, while on the chain are stored hashes of the data as results. This method is used by Hyperledger Fabric, Quorum, Masterchain, and also by Waves Enterprise.
- Data stored directly in the state of participants. Data is distributed across transactions in the blockchain, but requires a separate sub-network (or channel). This method is used by Exonum and also Hyperledger Fabric (why Hyperledger is in both categories will be explained later).
- Partial state storage among participants. Some additional information about transactions is transferred off-chain. This approach is implemented by Corda.
We will now explore different approaches to organizing private data flows in a blockchain in a little more detail. If you are interested in getting to know any of the solutions listed below, feel free to check out the documentation. All of these projects are interesting and have extensive documentation, but we will mention only key aspects related to the Private Data concept.
Type 1 solutions
The main idea of the solutions presented in this category is as follows:
- To send private data a node stores it in private database, which is somewhat integrated with the node.
- The node publishes a blockchain transaction containing the hash of this private data.
- Recipients are informed that there is some private data to retrieve. The recipient node makes a request to the Data Owner to obtain the Private Dataset.
- The owner node checks the request (which could be digitally signed) and if it is one of the recipients it creates a connection (TLS or other encryption) and transfers the data.
- When the data is received, the recipient checks the hash on the blockchain.
- If a new participant is authorized for some existing private data he can make a request in the same way as any other participant.
The systems listed below have implemented this general concept with small variations due to their architecture specialities.
When we talk about private blockchain solutions, Hyperledger Fabric is the first one to come to mind. The main feature of Fabric is flexibility. You can essentially build almost any private blockchain solution based on Hyperledger, but this also leads to the main disadvantage — it is too sophisticated even for experienced developers when trying to build a complex solution. When it comes to private data, Hyperledger Fabric allows you to choose one of two possibilities. In this section, we will look at the first one, which is to create a Private Data Collection. This involves exchanging data among participants (a subset of all channel participants) and recording the data hashes on the blockchain for some kind of proof. This is a reasonable approach when you know there will be many different groups to communicate with confidentially. The main privacy components of Fabric are:
- Each participant has an .X509 certificate for Identity confirmation (PKI). The whole system has CAs, and MSPs are used to check the permission of the certificates in the working domain.
- Consensus can be optimized in the channel configuration, which is good for overall performance.
- Private Data transfer is implemented using the Gossip Protocol.
- Private Data transfer is done using TLS with point-to-point transmission.
- Anchor peers are responsible for maintenance of the Map of Internet Addresses of peers in the channel.
Quorum is a modification of the Ethereum protocol produced by JP Morgan as their private blockchain solution. There are several advantages to using the Ethereum codebase, such as a large community who help detect bugs and improve the product in general, and already implemented smart-contract technologies — which lowers the cost of development for Quorum. Since PoW consensus is redundant for the needs of a private blockchain, the Quorum team proposed three types of consensus: RAFT, CliquePoA — which are fast — and IstanbulBFT, which provides higher security guarantees.
In terms of private data, Quorum has the following features:
- Blockchain state is partitioned between private and public. The public state is consistent among all nodes in the network.
- Storage and transmission of private data are organized by the Transaction Manager, which is an entity in the Quorum Node.
- Enclave is another entity, which is responsible for encryption and decryption of the private data.
- Data is transmitted point-to-point only in PGP-encrypted form.
- Has two implementations — Constellation and Tessera — for greater flexibility.
There are other modifications of the Ethereum protocol for private blockchain use, for example, Masterchain, which is a Russian project with a cryptography solution certified by Russian Government Standards. Other than having its own cryptography, it does not have major technical differences.
Type 1 solutions: pros and cons
The relative strengths and weaknesses of the solution will determine whether the solution is suitable for your needs. As for the first type, the strengths are:
- The solution completely covers the problem domain.
- You can transfer private data of arbitrary size and format, because the blockchain stores only data hashes.
- The solution is clean and simple, which is good for development.
However, we live in the real world and there are also some disadvantages:
- The solution is based on off-chain technology, which means that it provides less private data immutability guarantees than ordinary transactions, because in a conflict situation it cannot be resolved without a third party. For example, if one party modifies the document in private storage, using the blockchain and checking the stored hash we can see that the document was modified, but we cannot force the dishonest party to restore the right version. This is not stored on the blockchain directly, and can only be accessed by requesting it from an honest party.
Type 2 Solutions
This group of solutions aims to address the third downside in the previous group, which is off-chain private data storage and transfer. It is a safe guess that in this solution, data is transferred on-chain using data transactions. This is a more natural solution for the blockchain philosophy, but comes with certain limitations:
- To exchange data among participants, you need to create separate channels or sub-chains. The more communication with different groups you have, the more channels you need to support.
- As a result, you have higher infrastructure demands, because there is obviously more information stored directly on the blockchain.
You can of course create a separate channel in Hyperledger Fabric and this is reasonable when you have a consistently high volume of private data transfer among companies (for example on a daily basis). It is practical to use another channel for this communication. This is an expensive operation in terms of the number of VMs you use because you need additional nodes, an ordering service, MSP, etc. In this case, you can just write to the state with your data organized in key-value pairs.
Exonum does not operate with definitions like channels, but you can do pretty much the same if you deploy another Exonum sub-chain. The essential features to support data transfer in these sub-chains are:
- Variable block and transaction size, which can be defined in a subchain config. The upper bound according to Exonum documentation is 2³² bytes (approximately 4.3GB).
- pBFT consensus which is fast, if we have a small number of participants, and secure.
- Anchoring to Bitcoin to provide more security guarantees.
Type 2 solutions: pros and cons
In conclusion, for this type of solution the main benefits are:
- Can be very fast, because there is almost no overhead for off-chain systems (for example, databases for storing private data). There is no need to check data after transmission.
- The solution is fully based on the blockchain without the help of any off-chain components.
The drawbacks include:
- For each instance of a business process, you have to deploy a separate channel without the possibility of tying them up in one system. It also leads to significant additional costs for the creation and maintenance of infrastructure.
- If an attacker can set up a node and connect to the network, he can obtain all of the private data during synchronization.
Type 3 Solutions
Corda is a private blockchain solution developed by R3, and has gained popularity among financial institutions. Corda’s approach is reminiscent of the first category of solution, but does not really fit because of several unique features:
- There is no Transaction Broadcast. All transactions are transmitted point-to-point.
- There is no full copy of the blockchain on each node, which means that no node in the network knows the full current state. Each node stores only that part of the state where transactions explicitly address it.
- The state is partitioned between private and public.
- Private data is stored not in a specific database outside of the node, but in a special area of Corda Vault, so the Vault accumulates both public and private information.
- The private data itself, in the private area of the vault, is marked as ‘Notes’ to the transactions. Transactions contain the ids of the private data if it has to be transferred to other peers.
Corda: pros and cons
The main advantages of Corda’s solution are:
- Corda has fast transaction finalization on the network, since consensus doesn’t have to be reached among a relatively large number of participants. Each node keeps only the ‘relevant’ part of the state.
- Following from this point, it is very efficient in terms of memory consumption.
On the other hand, there are weaknesses to the concept:
- Unlike other blockchain solutions, Corda has a low level of replication, which is risky — especially when public state is partitioned among parties. In the case of an emergency, the probability of losing some parts of the state is higher than in more classic approaches.
- There is no data encryption for private storage.
Waves Enterprise solution architecture
So far we have provided a brief analysis of the strengths and weaknesses of the existing solutions, which can help us to formulate the main features we targeted while developing a similar concept in Waves Enterprise:
- Users should be able to exchange private data safely.
- The system should be reliable, with standard public state replication.
- Private data exchanges should be regulated by the blockchain, at least in the form of data hashes.
- The concept should not decrease the performance of the system as a whole.
To address the requirements listed above we divided our privacy concept into three main parts:
- Node Registration — helps us to ensure only known (both permissioned and registered parties) can gain access to the network. All unauthorized parties should be ignored in order to prevent leaking of the public state.
- Node handshakes are cryptographically signed to ensure communication only among known parties.
- Private data is stored and transmitted only in encrypted form.
Now we will briefly address all of Waves Enterprise’s privacy components. If you are interested in learning more, feel free to check out our documentation.
If you are interested in running your own node on the Waves Enterprise mainnet or want to add nodes for your private blockchain, you have to register it on the network, so other participants will know that you are also a trusted party. The registration process is as follows:
- The new party generates a node owner key pair and sends the public key to the network participant who has a Connection Manager role, providing additional information about their organization.
- The Connection Manager broadcasts a transaction to the network, which contains the public key of the new node and its set of permissions in the network, so other parties know that they are allowed to process requests from a new node after the transaction is applied to the blockchain.
- As a result, at any given moment the blockchain state contains information about all network participants. A node removal is also implemented as a transaction.
To ensure that during the network lifecycle we stay in touch only with authorized participants, we proposed the following signed handshake mechanics on an example of Alice and Bob:
- Alice and Bob generate a temporary key pair.
- Alice signs her public temporary key with her private regular key. Bob does the same with his key pairs.
- Alice and Bob exchange the produced signatures. Because their regular public keys are stored on the blockchain they are able to know that they are talking with each other, which makes man-in-the-middle attacks less likely.
Temporary keys are also used in private data encryption, before transfer to another party.
Waves Enterprise uses special Policy mechanics in order to give users a more convenient way to share data with a group of participants. Strictly speaking, Policy in Waves Enterprise is an entity which is used to regulate engagement with this private data. In order to create a policy, you should submit a transaction to the blockchain containing:
- A list of participants and their roles — Owner or Recipient. Recipients can only read and send data corresponding to this policy, while Owners can add and remove parties from it.
- Policy name, with optional description.
- Expiration Date.
Owners can add and remove policy participants using special transactions, to keep everything better synchronized among parties.
Private Data Flow
Waves Enterprise’s private data transfer flow is organized much like in other type 1 solutions, but with its own features, such as:
- OAuth service out of the box, which is used to check the access level of users inside the company and proceed only their requests. Organizations can manage whether a particular employee can have access to private data, smart contracts or just to read public transactions from the network.
- For private storage (which we recommend to be deployed in secured contour) participants can use any SQL database which supports JDBC driver, to allow better integration with companies’ existing architecture.
- Private data is stored and transferred only in encrypted form, as well as encryption keys for this private data. If the size of the data is more than 20 MB it will be transferred as packages, again only in encrypted form. The keys are unique for each encryption session to ensure greater security against keys being compromised.
Waves Enterprise: pros and cons
Ultimately, Waves Enterprise’s solution has the following key benefits:
- Security — minimized risk of an adversary gaining access even to public blockchain state. Private data is encrypted.
- As with any type 1 solution, one can transfer private data of arbitrary size and format because the blockchain stores only data hashes.
- Satisfies all criteria we defined for our solution.
Naturally, there are some points that should be considered to use the solution in a proper way:
- There is a centralized step of adding participants to the network, which is necessary for the private network to ensure that only authorized parties are involved in the protocol. However, this means it is not well-suited for a public network.
In conclusion we would like to say that we are designing our platform to fit actual business needs. Our customers and partners are our best development drivers. And now we can see how our customers’ needs help us to expand the horizons and philosophy of blockchain technology — enabling anonymity and transparency with data control and privacy in a trustless environment.
At the time of writing, Waves Enterprise is at the forefront of harnessing the synergies of combining technologies from two different worlds — those of centralized and decentralized enterprise. Stay tuned!