What IS a Personal Data Store?

Alan Mitchell
Mydex
Published in
15 min readMar 13, 2024

This blog is first in a series explaining how Mydex’s personal data infrastructure works. It explains how our platforms help deliver our mission of empowering individuals with their own data, enabling them to use this data to manage their lives better and assert their human rights in a practical way on a daily basis.

Summary

  • Mydex CIC’s personal data infrastructure is live and operational 24/7/365
  • The personal data store (PDS) lies at the heart of this infrastructure. PDSs enable individuals to collect, receive, store and use their own data under their own control, independently of the organisations that collect data about them.
  • PDSs can hold any and all types of data and are designed to last the individual’s lifetime.
  • PDSs are designed to protect peoples’ privacy. The data held in a person’s PDS can only be seen or accessed by them. Not even Mydex can see or access this data, because each individual holds their own key to their own PDS.
  • PDSs are designed to be safe and secure, using many different methods and approaches, all designed to work together to maintain the system’s integrity and continuity.

Introduction

Over the years, we have blogged many times about developments and issues as they relate to personal data. We’ve talked about the economic opportunities that are opened up by empowering individuals with their data, related human rights issues, or the design principles needed to operate a safe, secure, benign and efficient personal data ecosystem.

But we haven’t talked much about what we (Mydex CIC) actually do. After working on it for 17 years now, it’s time we did!

All this time we’ve been building, building, building. Not just a ‘product’ here or a service or app there, but an entire personal data infrastructure capable of providing every citizen nationally (and internationally) with the means to collect, store, manage, use and share any and all their data, from birth to death, in their own own safe, private database, independently of the organisations that collect and use data about them.

This infrastructure can and will change how our entire economy and society works. By giving individuals as much power over their data as organisations currently have, and by enabling individuals to build up the richest and most powerful personal data assets ever seen (data assets that provide individuals with a complete picture of their life), this infrastructure will transform what both people, and the services, researchers and governments they deal with, can do with data.

Building this infrastructure — infrastructure that works safely and securely, day in, day out, 24/7/365, at national/international scale — hasn’t been easy. It has taken time. But it’s live and operational and it’s time we talked about it in a little more detail.

If you are a software developer and you really want to dig deep into the nuts and bolts of how this infrastructure works, there is a separate website for you: dev.mydex.org. There are pages of detail here about, for example, how our APIs work, our data schema (describing what data we can currently collect and hold), or how to access and use our Identity or Master Reference Data services. But this blog will describe what we do at a higher level — the core functionality and design requirements of a Personal Data Store.

An overview

Three core platforms make up the foundations of our infrastructure. They are akin to a plumbing system with a tank that holds the water, pipes that let water flow into and out of this tank, and taps and valves which control these flows. Similarly, our infrastructure provides:

  • a Personal Data Store (PDS): the ‘tank’ that actually holds the data
  • a ‘Personal Data eXchange’ (PDX) system: the ‘pipes’ which enable data to flow into and out of the PDS
  • interfaces which enable individuals to control this flow of data — the ‘taps and valves’. Internally, we call this interface the Member Experience Layer or MEL

Another analogy is that of a digital filing cabinet. The PDS acts as the filing cabinet itself, keeping each person’s data safely for them under their own lock and key. It is linked to a postal service that enables them to send and receive data as and when they wish (the PDX). Then there are administrative tools that let them decide what data they wish to send or receive, to and from whom, for what purposes (the Experience Layer or MEL).

Now let’s answer some basic questions.

Who?

At its most expansive, our mission potentially embraces every individual in the world. We believe every individual in the world should have the means to collect, receive, store and use their own data for their own purposes — to manage their lives better.

They can (and should) be equipped with a PDS even before they are born — that starts gathering data about them, for them, in pregnancy. Their PDS should still be working for them after they die — to help manage probate and digital wills for example. In between, a person’s PDS creates a lifelong digital record of that person’s life for use by that person.

What data?

Our PDSs can hold any sort of data you like. Here are some examples.

  • Transactional activity data This is data that has a time series component. It keeps a record of things that have happened and that you have done. This could include bank transactions, itemised bills, browsing history, secure messages and interactions with service providers and other people. It could also include self- monitoring activity data, for example from wearable devices and sensor data. Huge amounts of data can be involved here. For example, right now, we are collecting billions of rows of data from smart energy metres that are making new readings of electricity in peoples’ homes every 30 seconds.
  • Lists, including assets of all types (documents, contracts, agreements, warranties, bookmarks, actions, calendars and events),
  • Verified State data This is usually data about you that has been generated by an organisation in the course of providing you with a service (for example blood pressure readings, test results or diagnoses in the case of a health service). Or it could be data that has been ‘conferred’ upon you by an organisation. This is potentially a very long list including, for example, educational attainments, training and professional qualifications, employment history, tenancy, tax status, National Insurance Number, passport details, driving licence details, Disclosure and Barring Service details, approved entitlements (e.g. to a Government benefit) or verified identity. And so on!

These are all ‘verified attributes’, where the data point in question comes wrapped with additional metadata (data about the data) by which the originating organisation confirms its validity. In this way, trust that this data is reliable can travel seamlessly with it wherever it goes.

Using the PDX, it is possible to keep track of any ways in which this data might change. For example, if you have a driving licence, the data about your driving licence in your PDS can be linked to the data in the DVLA’s systems so that any changes it makes to its records are instantly registered in your PDS. Thus, if you have just passed your Heavy Goods driving test or have been barred from driving, your PDS will log this change as soon as it is logged by the DVLA. In this way your PDS works for you ‘while you are asleep’, keeping your records up-to-date.

  • About me data This includes additional data that you decide to add to your PDS. This can be information about you that nobody else would necessarily know — for example your preferences, goals and plans. This is another potentially vast set of data, and it can be organised into different modules and dimensions. It is the sort of information that can be critically important to service providers wanting to understand you and your needs in the context of the services they provide. Because it is modular in nature (e.g. broken down to a series of separate details about you) you can choose to share different bundles of this data with different service providers, who often need different bundles of information. This helps you gain access to services and configure them to your needs.
  • Metadata about relationships and connections Metadata is data about the data. In this case, this metadata includes account level records of your relationships with service providers and other people that you share data with or received data from.
  • Immutable logs These record all activity and interactions with your and your use of Mydex APIs (links between your systems and others’ systems) and the applications connected to those APIs. They log who undertook what activity, via what route. Where relevant, they keep a before-and-after copy of the record, to show how it has been updated. All of this is crucial for audit and accountability. Should anything go wrong in your relationship with a service, you have proof of what has been transacted under your own control, like an audit trail which removes doubt.

To keep on top of each new piece of data that comes into or goes out of the PDS, and to keep it in its right place, we have created what we call the ‘Personal Data Store Master Schema’. It is this Schema that structures and scopes the personal data store held in the PDS, enabling it to be used in endlessly different ways in different combinations. The Schema enables interoperability between the individual’s PDS and the subscribers who deliver and collect personal data via the PDX.

You can see a list of the data sets we are currently handling here. This list is forever growing and will continue growing. Mydex can automatically deploy updates to the PDS Master Data Schema as new datasets become available, using open standards where applicable and available. Some examples in the current list are: bank account details (including each bank transaction), browsing history, energy consumption data, health and social care records, intentions and notes made by members using the PDS, telephone calls and SMS messages, and details of voluntary activities.

Over an individual’s lifetime their PDS can collect and hold a potentially vast amount of data. Even so, not all the data that is generated or collected by an organisation will necessarily be deposited into an individual’s PDS — only the data that is needed.

We stress this point because some people get the wrong end of the stick about PDSs, thinking they are meant to replace the traditional organisation-centric database. They are not. They are meant to complement it, adding to the ecosystem by enabling individuals as well as organisations to use this data, and enabling new uses of this data via eased data sharing.

Organisation-centric databases continue operating as they did before, with two changes. They can now deposit verified copies of some of their data in the individual’s PDS if the individual asks for it. They can also receive additional information from the individual’s PDS if they ask for it and the individual is prepared to provide it.

One crucial change made possible by the PDS is the ability to create new combinations of data. Currently, all your data is dispersed across the hundreds of different organisations that collect data about you (often the same data in many areas). Each one collects their own little slivers of data that really matter to them. PDSs enable individuals to become the ‘point of integration’, by which all this data from a vast range of sources can be gathered together and combined to create a complete picture of the individual’s life — all directly under the individual’s exclusive control.

How?

For us as a PDS provider, keeping on top of it all the above takes a lot of work. As you can imagine, providing this service involves a number of technical issues. Amongst these issues, interoperability is key.

It is irritating but true that different organisations use different formats to hold data — and if you jumble up these formats you end up with multiple data sets that cannot ‘talk’ to each other. A simple example is data formats where Americans and Europeans switch days and months around. To an American, 9/11/2001 means the eleventh of September 2001. To a European it means the ninth of November, 2001.

To cope with these endless sources of confusion, we have designed systems by which the PDS can store data that is delivered in the originating organisation’s raw delivery format but which also parse that data into the right part of the Master Schema (making appropriate links to ensure the integrity of the data is maintained at all times). The data is always watermarked with its source and verification status.

This means the data can be used in many different contexts, including being combined with other data sets that were once originally in completely different formats. By attaching metadata to the data — for example, who originally minted it, when, and when it was last checked and updated — trust can travel with each data point.

Where?

All the data held in our members’ PDSs are held in the Cloud, independently of and separate to any systems that organisations use to collect data about individuals.

This means that Mydex personal data stores are NOT just a different name or version of a traditional organisational ‘MyAccount’ service. With a MyAccount service, an individual can log in to an organisation’s systems (not their own personal system), see some of the data held by that particular organisation, tweak settings, and so on.

Mydex personal data stores have nothing to do with such MyAccount systems (even if these systems are sometimes misleadingly called ‘personal data stores’ or ‘personal online data stores’ — i.e. where an organisation is storing personal data about an individual.)

Using the Cloud as the storage medium is important because it provides maximum possible flexibility with security. Mydex personal data stores are not held in distributed ledgers or blockchains. These are too slow and unwieldy. Nor are they held on the individual’s device such as a smartphone. That’s too risky. (What happens if your device is lost or stolen?) Our PDSs are not the same as phone-held ‘wallets’ either, because these wallets hold only a tiny proportion of the data held by a PDS.

This doesn’t mean we are against any of these things. Mydex PDSs can and do connect to organisations’ MyAccount systems. They can work with specific blockchain applications. Mobile phones (and ‘wallets’ held on the mobile phones) are one of the main ways that individuals access and use their PDS — including via different types of web app. They are an interface. Part of the experience layer.

But the Cloud is the place for the PDS itself because of its scale, flexibility and security. The simple fact that the Cloud is ‘always on’ means data deliveries and collections can operate 24hrs a day, following the rules the member has set for their relationships with others.

Of course, the Cloud is provided by physical infrastructure sitting in physical places. Our PDSs sit in data centres rented off specialist service providers, with the servers operated in highly secure locations in the individual’s country of residence (to comply fully, not only with the letter of GDPR data protection regulations but with their spirit too). Because they are ‘always on’, we operate them across ‘availability zones’. This means that if one location goes off-line we keep running. We also back them up continually. This means we always have a means of recovery even in the highly unlikely scenario of the Cloud being lost and all the availability zones being lost.

Who is in control?

One key feature about our PDSs is that we cannot and do not want to see the data it holds or exert any control over it. The only person able to see the data held in a PDS is the person that the data relates to (see But is it safe? Below). We provide individuals with the tools by which they can exercise control over the sharing and use of this data, but these tools are designed to empower the individual, not us. We act for the individual, as their agent, carrying out their instructions. We explain more about how this works in our blog on the PDX (personal data exchange).

This issue of control raises a further important question about guardianship. The baby in the womb isn’t up to managing and controlling their data. Nor is an ageing person with dementia. Processes for safe, ethical guardianship are key to the design of what we do: we have designed auditable, easy-to-manage ways by which one individual can act for another in managing that individual’s data, including powers of attorney, when needed. Again, we’ll say more about this when we look at the PDX. These arrangements may include one individual acting on another’s behalf e.g. sending messages, reviewing referrals, calendar invites and any number of interactions with their service providers. Every such activity is tracked and audited transparently. This helps protect both the individual and the service provider interacting with that individual.

But is it safe?

Given the vast range and potential sensitivity of the data that can go into a PDS, security has to be paramount. (After all, this is a hugely powerful personal asset that will grow in value over your lifetime)

To maintain security at all times, all data held in a Mydex PDS is encrypted both at rest and in motion. From the moment it is entered into your PDS, your personal information is encrypted using AES 512-bit encryption, the highest standard available today.

The company as a whole is independently accredited and certified for Information Security and Management under ISO27001 (and has been for the last ten years). As part of this, Mydex operates extensive security and threat vector protections and monitoring to ensure the stability, availability and performance of its data infrastructure. This is constantly monitored with specific triggers and alerts and automated responses and mitigations.

To get an idea of the detailed work that goes into operating our encryption processes, go to this page on our developer website. There, you can see the instructions we give to developers wanting to deposit data in a PDS or receive data from it. This page provides a higher level overview of our security model.

Our platforms and all data held in our PDS have triple availability (e.g. stored in three different availability zones across a scalable infrastructure, all in undisclosed locations within the UK (for UK citizens). ​​These locations are monitored 24x7 and cannot be accessed without security clearance including photo ID and pre-notification in advance. Meanwhile, as part of our ISO27001 certification, we operate multi-layer backup services which means that if any data should be lost in some sort of outage or physical damage, it can be recovered (as described above).

Many companies make a big fuss about how privacy protecting they are because, like us, all the data they hold is encrypted. What they don’t point out is that while such encryption protects the data from external prying eyes, it doesn’t do anything to protect it from internal prying eyes — e.g. the company itself. If the company holds the encryption key, it can access this data any time it likes. Which means that their much-vaunted encryption-based privacy protection amounts to very little.

Our PDSs are different. First off, each PDS is a standalone database and each is separately encrypted. This means we have not built one single huge database holding millions of records, but a separate personal database for each individual. The distributed nature of this infrastructure means it is impossible for a hacker to do one, single hack to access a million records in one fell swoop (the sort of thing that is reported daily, in growing numbers, across the globe). To access a million people’s records on the Mydex platform, a hacker would have to conduct a million separate, successful hacks. This turns the economics and incentives of hacking on its head, maximising the hacker’s costs while minimising their rewards.

In addition, every individual has their own private encryption key to their PDS. Basically, this is their own additional passphrase (which can be up to 128 characters in length). Mydex doesn’t have access to this passphrase. This means not even Mydex can ‘see’ the data held by its members in their PDSs.

There are no hidden ‘back doors’. The entire system is designed NOT to have such back doors. (This isn’t without its problems. For example, many Governments and security services don’t like services that don’t keep secret back doors open for them.)

Because only the individual knows their particular Private Key, if they forget this passphrase we cannot reset it for them — because we never knew what it was in the first place. The downside of this is that if they lose this Private Key — if they forget this additional passphrase — they lose direct access to their data.

We have ways of addressing this problem. In a nutshell, we have created a system whereby individuals can break their Private Keys/additional passphrase into ‘shards’ or fragments. Each of these shards is cryptographically secured so it can only be accessed by the person who set it up. These shards are then distributed among the PDSs of friends and family, who are invited act as trustees of just one particular shard. The shards are thus held securely by friends and family on behalf of the individual. (The fact that it is a shard means that should the friend or family member turn evil and, in the extremely unlikely event of their being able to break the encryption, even then they would only be able to access one part of the Private Key, thereby rendering it useless.)

If the individual forgets their Private Key passphrase, they can then ask friends and family to return the cryptographic tokens back to them, and our systems will help the member put them back together and de-encrypt them for the individual. This is a bit of hassle for both us and the individual. But it’s needed if the whole system is to be genuinely secure.

Conclusion

Mydex has built citizen-empowering personal data infrastructure that enables individuals to safely and easily collect, store, manage, use and share their own data in their own database independently of all organisations that collect data about them. This helps remove friction, effort, risk, cost and micro-stress in their lives. It is core to our mission to equip people with easy to use tools that allow them to exercise their human rights easily, safely and securely and to take an active part in the workings of the digital economy and society.

The personal data store itself lies at the heart of this infrastructure, and it works as described above . But to be really useful, it needs to enable data sharing. That’s the subject of our next blog.

--

--