Crypto shredding: How it can solve modern data retention challenges

Crypto shredding is the concept of destroying data through the destruction of the cryptographic keys protecting the data. Without the decryption keys, the encrypted data is unusable — like a safe without the combination.

The relevance of crypto shredding stems from advancement in technology and changes to the political environment. Distributed technology such as blockchain depends on data being immutable, which introduces challenges for data destruction — doing so would conflict with the append-only requirement and violate the integrity of the chain. From a political perspective, new regulations define specifics for data retention and consumers rights over their personal information. On the surface, the technology and political landscapes are incompatible — this incompatibility is what crypto shredding aims to solve.

ThoughtWorks have recently identified crypto shredding as a technique worth trialling in their biannual Technology Radar. This has become particularly relevant with the Europe Union’s (EU) General Data Protection Regulation (GDPR) being enforced from 25th May 2018. The impact of GDPR is wide-reaching, but even if not currently applicable to an organisation, it’s likely similar legislation will develop across the world in the coming years.

How can crypto shredding help?

What if your customer asks you to remove their personal data from your system? Does your system allow this? What’s the impact on backup and archiving systems?

GDPR and its right to be forgotten will reshape how we approach system design and data architecture. The right to be forgotten gives individuals legal protection over their data such that they may request an organisation to erase all data stored about them.

The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay…

GDPR: Chapter III, Article 17, Point 1

“Data subject” refers to the end user (i.e. your customer), and “controller” refers to the organisation holding personal data (i.e. your organisation).

The technical implementation of this may seem straight-forward. If we have a database of customers, simply delete the customer exercising their right to be forgotten (assuming your legal environment allows this).

The challenge of backups and archives

Image for post
Figure 01: Delete a customer record from a database.

If Mary Brown wishes for her personal data to be removed, we delete Mary’s record from the database. Challenges arise in that our backups still contain the deleted data.

For most backup solutions it’s unfeasible or impossible to remove a specific record or records from a backup. Even if possible, it would hinder integrity checks, and increase storage/archiving costs as mutable storage media would be required. In large scale environments, the time to simply read and modify backup media may be greater than the influx of deletion requests.

In this example, 2 backups are taken before Mary’s record is deleted. While the primary store and latest backup do not contain Mary’s personal data, historic backups will retain a copy of since-deleted data for an extended period.

Image for post
Figure 02: Deleted records exist in database backups.

This is where crypto shredding can help.

Encryption key per customer

We know that we
· need to destroy certain personal data from our primary store, and all historic backups;
· cannot feasible alter backup media (or dispose of them);
· must destroy the data without undue delay.

If each customers record is encrypted with their own key, the key for that specific customer can be destroyed. This will render their data effectively destroyed in both the primary store and all historical backups. Without the key of the customer, their encrypted data cannot be decrypted or processed.

Image for post
Figure 03: Separate data and key stores.

Keys must be stored separately from the customer data and should not form part of the customer backups. However, it’s extremely important to still back up the key store — without it, all personal data of customers are unrecoverable.

Image for post
Figure 04: Using a short key store retention period enables the effective destruction of data from backups.

To achieve the right to be forgotten, the backup retention period of the key store needs to be significantly reduced. This ensures deleted keys are removed promptly. From a transactional data retention perspective, we can now maintain long term backups of customer data. When deleting a customer record, both the record and the key should be deleted at the same time. After the last backup containing the customer key is removed, the customer data is unrecoverable.

These same principals apply for a variety of technical data storage implementations such as tabular databases, blockchains, and Command Query Responsibility Segregation (CQRS) stores. The solution is always to protect each customer’s data with a unique key per customer and separate the key and transactional data stores.

Application implementation

Crypto shredding should be considered when designing the data and system architecture. As system components interacting with persistence stores must understand how the store encrypts and decrypts data, crypto shredding cannot simply be “bolted on” to an existing solution. However, if writes can be isolated, and read stores used as a cache (i.e. no long-term persistence), a middle ground can be achieved for existing systems.

Trivial implementations of crypto shredding

In a simple application, the application will read the keys from the key store and use these to decrypt records in the transactional data store. Keys exist in the key store and application memory. The keys and transactional data never exist alongside on a persisted medium.

Image for post
Figure 05: A simple application that uses keys from the key store to encrypt/decrypt data.

For scenarios requiring a high level of assurance of key security, keys could be loaded by an intermediary service which encrypts and decrypts data on behalf of the application. This means the business application never receives the key, and therefore there is a higher guarantee the keys have not been compromised or leaked. The impact of a solution like this is the compute and network overhead involved in serialisation, transfer, and deserialisation of each piece of protected data. This approach is unsuitable for all but trivial systems.

Image for post
Figure 06: A simple application that abstracts key operations for improved security of keys.

Data and keys have equally significant value

Both the decrypted data and the decryption key represent equal value. The impact of decrypted data being exposed usually conveys financial, reputational, and legal risk. If a decryption key is exposed it becomes impossible to confidently apply crypto shredding. If the key exists somewhere, the encrypted personal data is recoverable and therefore has not met the requirements of the right to be forgotten.

Both the personal data and decryption key are highly valuable and need to be protected. It’s therefore practical to share the decryption keys with the application alleviating compute and network load from in the previous example.

However, the scope of the key traversal should not exceed internal processing. When interacting with third parties the decrypted data should be shared, as the customer keys should not leave the organisation. The third party may implement their own implementation of crypto shredding, with their own key store. If GDPR is applicable, it holds organisations responsible for ensuring third parties adhere to the GDPR requirements too.

Image for post
Figure 07: The organisation is the boundary of keys and encrypted data. Third parties should receive decrypted data — they may also implement key shredding, but it should be with their own keys.

The data protection service is optional, but in larger applications it’s role is a form of abstraction and auditing of key use.

Querying data

With each record individually encrypted with its own key — in which the persistence layer does not have access to — it becomes extremely difficult and poorly performant to query data sets. To find all customers born in the 1980’s it would be necessary to decrypt the date of birth field of each record to identify matching customers. This would require a number of keys equivalent to the number of records in the database.

The purpose of crypto shredding is to allow us to effectively erase personal data without having to alter historical archives. It’s not trying to secure data at rest or in transit — there are other technologies that achieve this objective.

It’s reasonable to maintain the data in an unencrypted form if it’s not persisted to any form of long term backup or archive. Therefore, we may copy the data in unencrypted form to a caching layer of the system. Queries can be performantly executed against this cache layer. The cache layer may or may not be the same storage technology underlying the primary store.

Image for post
Figure 08: A decrypted copy of customer data. This should not be persisted on long term storage.

The key differences between the caching layer and the primary store are mutability and persistence. The application should not write directly to the cache. The cache should only be updated when the data has been successfully written to the primary store in encrypted form. This ensures the cache can be re-built from the primary store, so long-term persistence is not necessary. If the cache fails, it can be rebuilt without any data loss. For availability purposes, the caching layer should be treated with the same high-availability requirements as the primary store.

Implementing the cache may look like an eventual consistency architecture. Building the cache will require access to the keys to decrypt the data.

Image for post
Figure 09: The flow of data between components in an architecture that contains a decrypted data store for query operations.

For applications that have a low tolerance to eventual consistency, it may be necessary to read from the primary store most of the time. Queries would still need to be performed against the cache layer, but the results may be loaded from the primary store.

In cases where there is zero tolerance for eventual consistency, the application could write to the primary store and cache in a singular distributed transaction. This adds considerable storage and transactional complexity to the application layer. Modern systems should aim to avoid this.

Analytical processing

Most organisations will require analytical processing of personal data. In these cases, the data will need to be decrypted to build the data warehouse. To implement the right to be forgotten the data warehouse population process should detect and remove deleted personal data. As the data warehouse would store decrypted personal data, it should be subject to similar retention practices as the key store. The data warehouse should avoid storing and data which cannot be rebuilt from primary stores.

Image for post
Figure 10: Process to populate the data warehouse with decrypted customer data.

For transactional processing, a cache may have been used to facilitate querying of data. This requires a fully-populated cache. This same cache would then be used to rebuild the data warehouse — instead of an ETL that also needs to load the keys and decrypt records. This will reduce processing time, as well as the coupling on the key store.

Image for post
Figure 11: Using the cache to populate the data warehouse, reducing processing.

Additional processing considerations

Legal hold and restricted processing

There may be scenarios where it is not viable to effectively erase personal data. Such a scenario may involve legal proceeding in which an organisation is legally required to retain the personal data, irrespective of the right to be forgotten. In some cases, it may be necessary to prevent any further processing of personal data, without erasing the data.

GDPR acknowledges a few cases in which the right to be forgotten may not be exercised, such as:

…shall not apply to the extent that processing is necessary
(b) for compliance with a legal obligation…
(e) for the establishment, exercise or defence of legal claims.

GDPR: Chapter III, Article 17, Point 3

GDPR dictates that customers may request data processing to be suspended, but data may not be erased.

The data subject shall have the right to obtain from the controller restriction of processing…

GDPR: Chapter III, Article 18, Point 1

To address these scenarios, the key store can be modified to track whether the right to be forgotten is on hold, or the processing is restricted.

Image for post
Figure 12: Example data structure for tracking restricted processing and data retention holds.

In this implementation, Restricted Processing means the key should not be used to decrypt customer data for normal day-to-day processing. It should only be used to resolve the restriction. On hold means the key cannot be deleted from the key store. Conversely, this also means the customer data should not be erased from the primary store. Both Restricted Processing and On Hold may operate independently — this is, one will not enforce the other.

Image for post
Figure 13: The limitations of processing and retention.

Recovering deleted data

While the right to be forgotten enforces the right for an individual to remove their data from an organisation, GDPR also places requirements on the organisation to protect the integrity of data.

“Personal data shall be: processed in a manner that ensures appropriate security of the personal data, including protection against unauthorised or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organisational measures (‘integrity and confidentiality’). “

GDPR: Chapter II, Article 5, Point 1f

It’s possible to achieve the right to be forgotten while also enabling the user to recover their data. This may be suitable for cases where customer accounts are compromised, and a malicious deletion request submitted, or simply to give the customer a time window to change their mind.

When a right to be forgotten request is received, the organisation must effectively erase the data without undue delay. The organisation could achieve this by encrypting the customers key with a phrase known only to the user. In time, the organisation could delete the encrypted key, removing any recoverability of the data.

After a deletion request, the key store may look like this:

Image for post
Figure 14: Example of a data structure for tracking recoverable deletion requests.

The key has been removed and it is marked for deletion. The key is stored in an encrypted form with a decryption phrase known only to the user. To protect against a malicious deletion request, the phrase should be generated and sent to the user over an existing trusted channel (e.g. a verified email address). To restore the user’s data, the encrypted key would be decrypted and restored, allowing the application to again read the customers data. If the key is not restored by the deletion date, the key should be deleted, along with the customer's data.

At every point from the original deletion request, the organisation has effectively erased the customer's data — in that it has no way to process the data. While logical controls can achieve similar outcomes, it does not meet the requirements of GDPR which enforce deletion processing without undue delay.

Real-world crypto shredding

Crypto shredding is a relatively new concept driven by changing technology and political landscapes. A crypto shredding solution to data retention addresses several otherwise technically difficult requirements of GDPR. It also brings with it a broad range of challenges in how an application consumes and distributes personal information. Each technical solution will have its own merits and will meet regulatory compliance differently. Crypto shredding should be balanced between the legal requirements and technical capabilities, as it brings significant technical complexity. As implementations begin to exist and evolve, crypto shredding and similar solutions will likely become commonplace, just as password hashing and credit card data encryption did in previous years. Crypto shredding is another technique for protecting data in modern systems and societies.

Disclaimer: You should review your legal obligations and how your technical implementation adheres with a legal professional. This article is not legal advice on how to achieve GDPR compliance.

Written by

Technologist with a focus on automation, security, and scalable architecture.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store