An Architecture for the Control and Secure Processing of Personal Data

Dimitris Mitropoulos
Nov 1 · 6 min read

Online personal data are rarely, if ever, effectively controlled by the users they concern. If you think about it, any control users may have over their personal data is often too coarse (typically an all or nothing proposition), confusing (as it varies with each organization that holds and processes the data), and manually applied according to the willingness or even whim of each organization. Worse, as demonstrated by the numerous leaks reported each week, the organizations that store and process personal data fail to adequately safeguard the required confidentiality.

The importance of privacy in the digital world has led entire countries to reassess the ways that organizations handle personal data. A typical example involves the General Data Protection Regulation (GDPR) compiled by the European Union (EU).

The aim of this post is to introduce PDGuard; a framework that defines an architecture for the control and secure processing of personal data. To do so, PDGuard combines different concepts including applied cryptography, access control models, an API, authorization protocols, and taint tracking. In essence, the architecture provides technical and organizational measures to support fundamental data protection principles. It can be easily incorporated into business processes that handle personal data so that the data are not available without informed consent.

First, we present an overview of the architecture and describe its benefits. We provide links to a corresponding implementation and a number of applications (including The Guardian newspaper’s website identity application). Finally, we discuss future directions and explain further work that needs to be done. Throughout the post we use the terms “data subject” to refer to a person associated with personal data, and “data controller” to refer to a public or private organization holding and processing those data. Note that the terms are aligned with the ones introduced by GDPR.

Overview

Within the context of PDGuard, personal data are always stored encrypted as an opaque object. Decryption (and encryption) can only be performed through the PDGuard API, under data- and action-specific authorizations supplied by a third-party, an escrow agent, which is an entity trusted by both the data subject and the controller. By interacting with escrow agents, data subjects can reliably authorize and audit how data controllers use their personal data.

When data subjects establish a relationship with a data controller they supply the controller with the address of the escrow agent of their choice for the specific relationship. For data subjects not interested to setup a relationship with an external escrow agent, PDGuard can use a default internal escrow agent (running on behalf of the data controller), which offers the same functionality and implements the data controller’s personal data policy. The data controller will send to the escrow agent data storage and access requests according to its business needs, its personal data protection policy, as well as legal and regulatory requirements. In turn, data subjects will authorize the requests as they see fit.

The data controller software applications perform data encryption and decryption on-demand with keys supplied each time through the escrow agent’s authorization service, in response to authenticated entity requests. Each escrow agent allows data subjects to associate permissions with specific data types and data uses. For example, a data subject, Mary, can allow a data controller, Acme, to use her postal address (data type) for labeling (data use) a gadget she ordered, but not for sending her advertisements (data use). The data regarding all authorizations granted by an agent are made available to the data subjects, so that they can review them or revoke future uses.

A static code verification tool can be employed to identify accidental frame-work misuses within an application’s software code. In particular, developers can review data controller applications and check if the decrypted data retrieved through the PDGuard API, are used as intended.

The figure above, illustrates a data processing system supported by PDGuard. Observe (at the top) that personal data are stored encrypted. Every time a database action (1) involving personal data is performed, data controller applications invoke the PDGuard API (2). In turn, the API sends authorization requests (3) to the corresponding escrow agent. If a request conforms to the rules set by the data subject, the agent responds with the required decryption key (4).

Advances

Following the “security by design” approach (recommended by GDPR), PDGuard improves the effectiveness of personal data protection in the multiple ways. First, it decentralizes trust and control by having numerous escrow agents of diverse implementations control the decryption keys. This empowers data subjects with fine-grained control of how their personal data are used. In addition, it replaces a patchwork of manually enforced security policies with a standardized API that can be uniformly applied, monitored, and audited across all data controllers and software applications. It reduces the attack surface and the risk of individual vulnerabilities of the systems it is deployed on, through the decryption of personal data at the time of their use. Also, it increases transparency in the handling of personal data. This allows data subjects to choose among data controllers based on the concrete protection of that they offer (adoption of PDGuard and actual data use), provides regulators with an easy way to establish whether personal data is effectively protected, and allows market-based mechanisms to spread adoption of competing PDGuard implementations, escrow agents, and auditing services.

Applications

We have demonstrated the framework’s applicability through a reference implementation, by building a PDGuard-based e-shop, and by integrating PDGuard into The Guardian newspaper’s website identity application. Note that this application is directly related to personal user data because it handles the Guardian profile functionality. Choosing Guardian as one of our use cases was not a random choice as it invigorates the newspaper’s values to be free from commercial or political interference.

Using the PDGuard library is straightforward. Specifically, there are two methods encryptData and decryptData that are parts of a general class which we call DataProtection. Consider the case where Alice provides her surname to a news and media website when she signs up for the first time. The code running on the background should invoke the encryptData method in the following manner (assume that surname is a string variable which has Alice’s surname as its value):

dp.encryptData(surname,
DataType.SURNAME,
DataProvenance.DATA_SUBJECT_EXPLICIT,
false);

Observe that, the data provenance (DATA_SUBJECT_EXPLICIT) indicates that this piece of data comes from the data subject itself. The final argument informs the escrow agent that this action is not an update. Assuming that the website needs to send a weekly digest via email to Alice, per her request, it needs to retrieve her email (stored encrypted in the database). In this case, the decryptData method should be invoked in a similar manner.

Through our work we realized that little developer effort is required to protect a wide range of confidential data. For instance, the source lines of code were increased by 4.8% in the case of the Guardian’s identity application.

Steps Forward

Can PDGuard actually make a difference in our everyday lives? This goal requires a lot more work ranging from scientific and technical to evangelism. Important elements include: the design of performance optimizations at the architectural level; the running of large scale trials; the operation of an escrow agent in a production setting; the initial adoption by significant data controllers; the establishment of a community and a governance structure;as well as the education of developers, data controllers, and data subjects. Achieving these objectives seems like a tall order, but this is the way in which technology changes our lives.

Prototype Availability and Further Details

An extended description of PDGuard can be found in an article published by the “International Journal of Information Security”. A reference implementation and deployment guidelines can be accessed in the following URL: https://github.com/AUEB-BALab/PDGuard

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade