On May 25th, 2018, the General Data Protection Regulation (GDPR) became law. This law, also known as the “algemene verordening gegevensbescherming“ (AVG) in the Netherlands, gives citizens more control of the usage of their data, as described on the website of the Dutch government.
“Citizens now have more rights when it comes to knowing which data is collected from them, how long it is kept and to which third parties it is provided. They have the option of having their data changed or removed. Governments, organizations, and companies that do not comply with the rules can be fined for this. Careful handling of personal data of citizens and employees is a characteristic of open and transparent organizations that work with trust and are prepared to account for their relationships and their environment.” — GDPR intent, Dutch government website (translated)
Until the GDPR became law it was, in some cases, impossible for citizens to control the data they share with companies or organizations. This could create situations where certain data was used without the (active) consent of the citizen.
In the field of public health, GGD, RIVM, and CBS work together in monitoring the health of citizens. They run surveys to develop a snapshot of the population’s health. Today, the data generated in these surveys can be used by all three authorities.
Although the collaboration of these parties is still possible under current legislation, it lacks transparency. Participants are insufficiently aware of the purposes for which their data are used. For this reason, the GGD was looking for a method to store this data in a secure and decentralized manner. The solution should give citizens full transparency and control over which personal data is used and for what purpose, with the option to withdraw access to this data at will. This case, named HealthChain, was developed as a Proof of Concept (POC) and focuses on citizens who fill in surveys as well as partners (organizations or companies) wanting to make use of the resulting survey data.
Before implementing blockchain into a system, it is important to understand the reason why this technology offers benefits over traditional database setups. A blockchain can be seen as a distributed digital logbook, or ledger, to which participants can write data. When data is written to the ledger in a transaction it can be seen by all participants and can not be altered afterward.
When a new transaction is written to the blockchain, a hash is calculated of the current transaction block and the preceding blocks, forming a block-chain (each block points to the block preceding it). The hash is known as the block hash. Because each block hash calculated makes use of all previous block hashes, a single change to data anywhere in the ledger will cause all blocks following the one modified to no longer have valid block hashes. For this reason, blockchains are immutable.
This immutability is what makes blockchain technology applicable to the GGD’s wishes: A secure and decentralized storage system in which the end-user (citizen) has full transparency and control over their personally identifiable information (PII). Data in a regular database can be modified by a system administrator and is susceptible to infiltration. Data on a blockchain, however, is not vulnerable to these issues because of its immutable nature. Therefore, one can be certain that any data they access from a blockchain has not been tampered with.
This immutability has the side effect of not being GDPR compliant, as a user must be able to have their data removed completely from the system. Since data on a blockchain can not be modified, it can also not be deleted. However, with the correct architecture, a GDPR-compliant blockchain solution can still be realized.
By storing PII off-chain in a regular database, generating a hash from the data, and storing the metadata of that information along with the hash on the blockchain, a solution is realized. This is known as data linking. No PII is stored on the blockchain but the blockchain can be used to verify PII data in a database.
As an example, imagine the string “This is private data” contains data that should not be placed on a blockchain. Instead, it is stored in a database and a SHA256 hash is calculated from the data:
This hash is stored on the blockchain. When someone wants to verify the data in the database, they can calculate the hash for that data and compare it to the hash stored to the blockchain. If they match, the data has not been tampered with. The system built stores sensitive survey data in the same way.
Hyperledger Fabric and MongoDB are part of the tech stack used by the development team at TheLedger. Hyperledger Fabric, a private & permissioned blockchain solution, is well suited for the rapid storage and retrieval of data while MongoDB acts as a valid off-chain storage solution for sensitive data that should not be placed on an immutable data store such as a blockchain. For user account control, the developed system integrated the existing Identity Hub platform in use at the GGD.
The roles users can have in HealthChain consist of partners and citizens. Partners can create new surveys and select the intended audience as well as access response data for their created surveys. They can also request permission to access response data from other organizations, to which each citizen must agree before their data becomes available. Citizens can fill in surveys sent to them as well as edit their previous responses. We implemented a method to validate private data via the blockchain without putting the data on the blockchain itself, which gives citizens who answer surveys full control over who has access to the data. Because HealthChain was built with user-centric privacy and control in mind, citizens can modify their data, change who can access that data, and if they choose to do so, remove all data related to them from the system completely. The data hashes still remain on the blockchain but since the private data in the off-chain database has been removed, the “data link” is broken and the PII is no longer accessible by anyone. Both partners and citizens have access to a detailed log of all events relating to their accounts, stored on the blockchain and enabling full transparency.
One interesting problem to solve was allowing citizens to edit their survey responses at a later time. Even though updating values in a database is simple, modifying that data also causes the hash to change. Since the hash of the data in the database and on the blockchain now differ, validation fails to work. To solve this, a new transaction is stored to the blockchain with a reference to the previous transaction, in essence “updating” the hash to reflect the data changes. The old hash value is still on the blockchain, but verification takes place using the latest related transaction. This solves the problem: data validity is still in place, GDPR compliance is adhered to, and response updates are possible.
While the public view of blockchain may still be lingering around distrust related to cryptocurrencies, the growing adoption of the technology is incontrovertible. Clear use-cases for blockchain technology continue to appear across a wide range of sectors as time goes by, slowly changing the hype into realized value. The HealthChain proof-of-concept can continue on to a pilot phase and eventually production, putting data control in the hands of citizens and offering further proof that blockchain products can be developed with GDPR compliance in mind.