Utilizing Mutable and Immutable Data Stores with Compliance

How can industries thrive in a world that needs immutable data BUT must allow for data modification and deletion?

Jay Wall
Fluree PBC
11 min readMay 13, 2019

--

Keeping Data in Its Place

This article serves as a call to use the appropriate data store for mutable and immutable data. In our current landscape of compliance and regulation, temporary or private data may require changes or deletion — in turn requiring a location that allows for mutation.

Examples include: name changes, address changes, or removal of former customer data. This function is usually managed in traditional relational databases. But, when capturing trends or a traceable history are critical, an immutable data store is the best option. Selecting the appropriate storage for data is critical in today’s data-centric world.

Traditional Databases

With traditional mutable database structures, data changes replace the previous record data. To maintain database history, deployment of detailed business specific back-up policies are required. The plan might include a full backup on a weekly basis, a differential backup on a daily basis, and transaction log backups every 30-minutes. This causes management burden and storage challenges with these potentially redundant procedures. Additionally, review of historical data requires effort to reconcile with current state.

There are many different “traditional” databases, but the most popular are relational databases. Relational databases rely upon schemas that bind Primary Key to Foreign Key relationships. These relationships facilitate data queries using joins via via Structured Query Language (SQL). Relational databases are present and popular across most of today’s industries.

Relational databases have been around since the 1970s. More recently, data stores built on NoSQL (“Not only SQL”) databases are in wide use. These databases do not require the formal structure of relational databases. NoSQL databases include: key-value, graph, wide-column, and document databases. In most cases, these are mutable databases.

Blockchain-backed Data Stores

Blockchain technology provides a full, trusted, and traceable history of all historical transactions. This immutability history includes all:

  • New transactions,
  • Updates made to existing entries, and
  • “Deletions” of existing entries.

This is the append-only nature of the blockchain ledger. All transactions create a new record, while the previous information remains unchanged. The blockchain structure also extends this storage facility to build a “trustless” system. Blockchains link the cryptographic hash of the previous block contents to the present block hash — making unauthorized data manipulation immediately detectable. This provides a sequential, time-ordered record of trusted transactional operations.

See: Blockchain Immutability — Why does it matter?

The most promising industries for blockchain application those requiring data exchange and compliance. Many industries fit this bill, but the prime candidates include insurance, healthcare, and supply chain. I chose the healthcare industry as the focus for this article — where exchange and modification of personal data occur, while patient health data integrity and management of these data are critical (and highly regulated).

Healthcare Industry Opportunities and Challenges

One can only imagine the volume of healthcare data exchange that occurs every day in the world. There are many stakeholders that take part in this exchange — providers, partners, patients, and payers. This quantity of participants makes for a complex ecosystem of data management. In the US, these data are protected by the Health Insurance Portability and Accountability Act of 1996 (HIPAA). HIPAA requires mandatory compliance in protecting patient data from disclosure and misuse.

In the EU, the General Data Protection Regulation (GDPR) provides for the same protection of EU citizens’ data. GDPR protection extends far beyond just healthcare, but personal health data remains a key protection provided. Most legacy systems in use today are relational databases — mutable data sources that can be overwritten. This lack of history can lead to misinformation and misinterpretation of data — leading to issues not only in compliance but in interoperability.

Thus, the healthcare industry provides a great opportunity for blockchain technology adoption. With the deployment of immutable data structures, people will realize the benefit of increased engagement and ownership of their personal healthcare data. Furthermore, clinicians will be better able to deliver appropriate care based on a full and trusted history of patient information. An immutable ledger of all health events would provide:

  • Historical/current medications
  • Historical evaluations
  • Historical diagnoses
  • Historical treatments

With this data, patients can track their health with full trust, and clinicians can treat patients in an efficient and effective manner. Also, by keeping the full repository of data, future learning and application required for Artificial Intelligence (AI) and Machine Learning technologies will be much more efficient. Finally, better and higher quality analytical insights may be gleaned from the unidentified aggregate historical data set (for research and public health analysis).

The Immutable Conundrum

This is wonderful, right?! Well, emerging updates to “personal data” compliance regulations present a challenge. Most significantly the EU GDPR, Article 17 includes “the right to be forgotten.” Article 17 requires that the “data subject has the right to request erasure of personal data related to them.” This rule holds true for any third party handling EU citizen personal data — applicable anywhere in the world. This presents an issue for immutable data storage because, by design, the full transaction history is present — so “forgetting” data is not possible.

How then can industries thrive in a world that needs immutable data BUT must allow for data modification and deletion?

One Solution — Crypto-shredding

One popular option for managing personal data in an immutable datastore is Crypto-shredding. Key Management Systems (KMS) are deployed to manage the encryption, control, and safe-keeping of unique keys. In this system, an encrypted key list is constructed within the personal data store. This key is also present in the non-identified/non-identifiable immutable data store. The burden is then on the application layer to pass requests between the two data stores. Once implemented, decisions are possible with full information — based on the exposure provided by the application.

When a request for data to be “forgotten” occurs, the encryption key(s) is overwritten. This breaks the link to the associated underlying immutable data — the transactional data integrity in the immutable data store is maintained, yet the reference to the individual to which the data pertains is removed. This satisfies GDPR requirements, as there is no longer a record tied to the EU citizen’s personal data.

It is possible to implement key encryption at specific data granularity levels. As an example, if a key is established for just a person’s name or ID, the key-shredding eliminates all data associated with that person. At a higher level a full “user profile” that included address, gender, and other semi-private information could be keyed — thus if the key were deleted, there would still be a record of some pertinent details for the subject. This would facilitate the availability of aggregate data based on unidentified real patients for research efforts. Additionally, this level of granularity can be applied in different data sources within a network, and the same specific data in each of these systems can have different keys. Finally, regardless of the implementation, the specificity with which the encrypted keys are assigned — the same data in one system can be assigned another key in a linked system. This can help in the shredding of the more specific data in isolation.

Apple’s iOS “effaceable storage” is an example of crypto-shredding in practice. This is the technology that is used on iOS devices when you “Erase All Content and Settings.” Apple achieves this by reserving a dedicated area of NAND storage to hold the cryptographic keys, which is wiped leaving all personal information on the device inaccessible.

Crypto-shredding Model

A Better Solution: Separate, Linked Data Sources

In a recent Fluree podcast the challenge of storing personal data was highlighted. The point was made that this is really an application architecture struggle, not a blockchain struggle. Data need to be separate on the back end, and only queried (not managed) at the application tier. The app simply pulls data either from mutable or immutable databases based on the specific data required. Data system architects should not store personal data in the immutable data store.

This recommended option for keeping data that may need to be changed, modified or deleted separate from immutable data involves deploying a separate mutable database(s) for personal data.

This multi-source approach allows for the blockchain structure to maintain full historical integrity while leaving the personal data maintenance in a mutable location. In practice, to effectively accomplish this configuration at scale involves assigning an anonymous, cross-join identifier to bind the two sources. The presence of this common key in the mutable database as an anchor to the associated personal data would allow for the destruction of all references to the personal data (key and personal data). This would have no effect on the immutable database, where the key would remain as a binding to the historical transactions that occurred. Using this method allows each database to function with an appropriate intended function, and specifically protects the “right to be forgotten” in compliance with the GDPR regulation.

By intentional design, the Fluree data stack fully supports this approach by embracing the semantic web and cross-dataset queries. Fluree utilizes W3C Resource Description Framework (RDF) subject-predicate-object structures and has integrated support for SPARQL — a protocol and RDF query language. With SPARQL, queries can be issued seamlessly against multiple data sources on the back end, while presenting a unified result set to the middle tier for processing. In the graphic below, this structure is outlined showing: (1) cross-dataset queries, (2) application layer consumption of data via SPARQL, and (3) a seamless user output as if data was pulled from a single source.

Separate Data Stories, Queried from the Application Tier (no KMS required)

Opportunity for Application

As a healthcare industry example, let’s start with a patient receiving care at multiple different healthcare facilities.

  • This patient, Alma, is 72 years old and is in relatively good health. Yet, she has recently started to experience mild symptoms of dementia.
  • Her primary care physician prescribed Aricept (donepezil) to aid her condition. Aricept belongs to a class of drugs known as acetylcholinesterase inhibitors (AChEI). These drugs increase the level of acetylcholine through inhibition of cholinesterase — an enzyme that breaks down acetylcholine.
  • A few months later, Alma expresses concern about bladder incontinence to her family. As such, Alma’s family arranges for her to see a prominent urologist in the area. The urologist does not have access to Alma’s medication list — as this physician practices with the competing health system in the area. Alma’s family also did not bring her medication list as they assumed “it would all be in the computer.”
  • With the available patient data about Alma, the urologist proposes that Alma trial a popular incontinence medication, Ditropan (oxybutynin), before proceeding with any possible invasive studies. However, if the urologist had known that Alma was actively taking Aricept for dementia, this physician likely never would have prescribed Ditropan.
  • The mechanism of action for Ditropan is anticholinergic in nature — it will in effect cancel out much of the benefit of Aricept.

Unfortunately, this situation is quite common in our current healthcare systems.

The use of drugs that interact/counteract the effects of other drugs when patient data are ignored or are not available to the prescribing physician is covered in a recent Clinical Interventions in Aging article. The authors’ summary highlights the broad scope of drugs and conditions that come into play:

“One of the most important pharmacodynamic interactions, based on their opposing mechanisms of action, is represented by the concomitant use of AChEIs and anticholinergic drugs that results in pharmacological antagonism.72 The anticholinergic effect is a feature of several drugs such as antipsychotics, antidepressants, antihistamines, bronchodilators, and drugs for urinary incontinence that are frequently prescribed to Alzheimer patients, especially to those with behavioral and psychotic symptoms.”

Further to this, many of the active ingredients are components of over-the-counter drugs available throughout the US. Scary, huh? If Alma’s full patient history were available for review, this would not have happened. Having a full history would have included ALL her medications, treatments, and evaluations. This is a perfect use case example for immutable blockchain-backed data sources.

Obviously, this is dangerous. In a split mutable/immutable situation, the woman’s personal data would have been held off-line from the immutable history that would track all of her conditions and medications so that she could be appropriately treated with full knowledge. This is important because as in this case, historical events are pertinent to her present treatment. Also, If historical trending was needed to treat her, this data would have been available to support the present situation.

Conclusions

  • Separation of data appropriate to the data type and context is critical in determining and implementing a functional data management solution
  • Personal data should not be held in an immutable data store
  • Blockchain ledger backed databases provide a full and trusted history when it is critical to have these data for pertinence, trending, or auditing purposes
  • Fluree provides a product that fills this need — more information about the offering is listed below.

More on Fluree

Fluree is a data-centric data management platform for modern applications. Fluree treats data as a first-class identity, and these data reside in all forms, including: the database schema construction history, the database access authorization engine, and all statements of fact at any given time. Fluree has two components: FlureeDL, an immutable distributed ledger, and FlureeDB, a graph database optimized to build applications on top of FlureeDL. In both systems, Fluree developers use sophisticated logic (SmartFunctionsTM) to enforce custom read/write permissions and rules.

Fluree uses the W3C RDF format and supports the SPARQL query language. This facilitates the deployment of semantic web-aware application architectures and the use of queries bridging multiple data sources — whether mutable or immutable. Fluree Community edition is free — as both a cloud-hosted DBaaS and as a downloadable Java executable for local deployment.

Technical documentation is available here and includes training — videos, examples, and walk-throughs. Check it out!

Website: www.flur.ee

Docs: http://docs.flur.ee

Twitter: https://twitter.com/FlureePBC

LinkedIn: https://www.linkedin.com/company/fluree-pbc/

Email: support@flur.ee

I offer a special thanks to Kevin Doubleday for his help in bringing this article to life!

Disclaimer: This article is not legal advice on compliance with HIPAA, nor with GDPR. Additionally, this article is also not medical guidance for treatment of any disease.

--

--

Jay Wall
Fluree PBC

Latest rabbit holes: blockchain technology, data-centric ontologies, the semantic web, data exchange interoperability