The Blockchain-GDPR Paradox
The General Data Protection Regulation, or GDPR in short, will become enforceable from 25 May 2018. Fact is, this will have (and already has) a major impact in organisations both large and small. In this post I will highlight some topics on how GDPR relates to blockchain technology. Especially on how GDPR has the opposite effect in some ways, when it comes to making Blockchain Architecture compliant with GDPR.
The Blockchain part
To explain why GDPR has to opposite effect in certain areas when applied to blockchain technology, we need to go over some basic concepts first.
Encryption and hashing
Both encryption and hashing are fundamental to blockchain technologies. In short, hashing is a one-way transformation of data to an unreadable piece of data (hash value). With encryption you can have a two-way transformation: You encrypt data with a certain key, so it becomes unreadable. With this key you can always decrypt this unreadable piece of data to the original value.
Immutability of transactions
By now, you will have heard that transactions on a blockchain are immutable. You cannot change these transactions once they are written on a blockchain. You cannot delete this data, since this would ‘break the chain’ in a sense, rendering the complete blockchain useless.
As an individual, you can browse through the complete history of all bitcoin transactions, making the transactions on this public blockchain technology completely transparent. Transparency in private blockchains is another matter, but it is still guaranteed in other ways.
Public vs Permissioned
This post will be focused on permissioned blockchains — where nodes are permissioned hosted — although a lot of arguments below are still applicable to public blockchains.
CRUD vs CRAB
CRUD stands for Create-Read-Update-Delete. These are the basic operations of persistent storage. Now remember from the basic blockchain topics mentioned above, that you cannot delete written transactions on a blockchain. Even updating existing transactions cannot be done, since they are immutable. Therefore the ‘CRUD’ operations are not the droids you are looking for!
Instead, operations on blockchain can be described as CRAB: Create-Retrieve-Append-Burn. This concept of ‘CRAB’ is invented by the folks from BigChainDB. The Create and Retrieve don’t need explanation. The Append, which replaces Update, means that you can only append new transactions to a blockchain technology, therefor changing the ‘world state’ (sum of all past events/transactions up until now). According to BigChainDB, the Burn operation in CRAB means that you throw away the encryption keys, so you are unable to Append new transactions and change the world-state any further of this asset. Instead of forgetting your encryption key, you can also also set the transaction to an “unsolvable” private key by choosing a completely random public key, thereby locking yourself and everyone else out. When using the CRAB terminology on other blockchain technologies, you could also — for example — interpret the Burn operation as throwing away encryption keys so you are unable to decrypt the actual data that is written on a blockchain technology. But we will come back to this.
The GDPR part
The complete official GDPR document can be consulted freely on the internet. The term ‘erasure’ and ‘erase’ are found 12 times in this document at the time of this writing (the website offers an easy search method).
An important aspect of GDPR on blockchain is the fact that personal data is not to leave the EU. This is a major problem with public blockchains, since there is no control on who hosts a node. This is less an issue when it comes to private or permissioned blockchains. To tackle this problem, IPDB set up a foundation that could insure data stays in the EU because it is public accessible (client) but permissioned hosted (node) blockchain.
There is also a separate section — Art. 17 — on ‘Right to be Forgotten’. This concept is clearly an important one regarding ‘erasure of data’. However, not anywhere in the document, not even in the definitions part — Art. 4 — is there any explanation of what the term erasure of data actually means.
The GDPR initiative probably had only CRUD in mind (“you are always able to Delete information”) when dealing with basic operations on persistent storage. The fact that this doesn’t match with blockchain technology creates some friction. Now, because there is no definition in GDPR of “erasure of data” at this point, you probably need to interpret this as strict, which means that throwing away your encryption keys which encrypts personal data in a blockchain technology is not acceptable as ‘erasure of data’ according to GDPR.
Of course, this has consequences on what we can store on a blockchain. Storing personal data on a blockchain is not an option anymore according to GDPR. A popular option to get around this problem is a very simple one: You store the personal data off-chain and store the reference to this data, along with a hash of this data and other metadata (like claims and permissions about this data), on the blockchain. To see how this works in a permissioned blockchain, consider the picture below. There are 2 companies (called BlueCompany and GreenCompany) with each there own back-ends, both connected to a blockchain.
Suppose GreenCompany wants to read the MyAddress value, he now has to do the following steps:
- Since GreenCompany does not know where MyAddress is stored, he sends a request to the blockchain layer for fetching the specific data.
- The blockchain can verify if the requestor (GreenCompany) has the necessary access rights to read this data. If the requestor has the proper authorisation, he gets the link and the hash of the requested data. (The link can be anything, like an API-endpoint including access tokens, or a database connection-string, …) So here, your blockchain acts as an “access control” medium.
- Based on the link, the requestor can fetch the data directly from BlueCompany’s back-end without going through the blockchain again.
- Upon receiving the data from BlueCompany’s back-end, GreenCompany can verify if this data has not been tampered with by calculating the hash of the retrieved data, and comparing it with the hash given by the blockchain. If they match, the data has not been tampered with.
As you can see, this workaround has increased the overall complexity of fetching and storing data on a blockchain. Next, we’ll discuss the consequences of this approach.
A compromise is rarely good for business
I specifically used the term ‘Workaround’ above, to emphasise that the results of this approach are never as effective as storing and retrieving personal data straight from the blockchain (GDPR concerns aside). Let’s cover the pro’s and con’s of this approach.
- The approach described above is a 100% GDPR compliant solution, which makes it possible to completely erase data in the off-chain storage. Thereby rendering the links and hashes stored on the blockchain completely useless.
- In this scenario, you use the blockchain primarily as an ‘access control’ medium, where claims are publicly verifiable. This would give someone the means to prove that some node should not store the data after an opt-out. Of course, this benefit can also be present when personal data is indeed stored on a blockchain.
Oh boy, here we go…
- The benefit of transparency with blockchain is reduced. By storing your data off-chain, you have no way of knowing for sure who accessed your data, and who has access to your data. Once GreenCompany has the link to retrieve the data, he is not bound anymore by going through the blockchain. Maybe the link gets stored in their own database so they don’t have to pass by the blockchain again for new retrievals, the link gets stolen, <insert other doom-scenarios here>, …
- The benefit of data-ownership with blockchain is reduced. Once your data has been stored off-chain, who owns it? The company who owns the database where your data is in? At least with blockchain technology, it is the data-owner has has all the encryption-keys to administer his own data.
- You still need a point-to-point integration between all the participating parties. After getting the link from the blockchain, you need a way to get data from BlueCompany to GreenCompany. For every new party added to the system, you will need to add new point-to-point integrations with each existing member, as well as provision a secure PKI.
You are reducing your blockchain to a mere lookup table, thereby throwing away a lot of benefits that comes with this technology.
(As a side-note: using your blockchain only as a lookup table is how a lot of companies get in the media with their ‘solution’, which — depending on the use case — is not enough.)
- More attack vectors. Each company has their own infrastructure and application landscape. By spreading the personal data over these different companies, the risk increases for a potential breach where part of this personal information can be stolen.
- Added complexity. Added complexity = Increased risk of unintended errors. Resulting in less secure systems.
And here is the paradox: The goal of GPDR is to “give citizens back the control of their personal data, whilst imposing strict rules on those hosting and ‘processing’ this data, anywhere in the world.” Also, one of the things GDPR states is that data “should be erasable”. Since throwing away your encryption keys is not the same as ‘erasure of data’, GDPR prohibits us from storing personal data on a blockchain level. Thereby losing the ability to enhance control of your own personal data.
Now, I know that sounded harsh. And in defense of GDPR, you could optimise the proposed solution above to counter some disadvantages. Or choose a totally different solution than the one described above to tackle the problems surrounding immutability of transactions.
However, whatever solution you are going with, the overall increased complexity will still be a serious disadvantage.
With blockchain technologies emerging, we have new ways to further strengthen data-ownership, transparency and trust between entities (to name a few). The way GDPR is formulated, we cannot store personal data directly on the blockchain since in GDPR terms ‘it is not erasable’. This prohibits us from using this technology to its full potential, so we need to rely on ‘older’ systems for storing data which simply cannot guarantee the same benefits as most blockchain technologies:
- Who owns the data in your off-chain storage?
- Is the off-chain data even encrypted?
- Who can access this data?
- Where is it stored? Is it already copied to other systems?
A lot of stuff to think about…