The Perils of Pre-Copernican Data Strategy

Alan Mitchell
Mydex
Published in
9 min readMay 31, 2022

It’s an oft-told story but it has a new relevance. Back in the middle ages, people believed that the earth was the centre of the universe so that everything else, including the sun, circled round it. When they tried to track the movement of the planets through the sky they were presented with a puzzle. The planets’ motions didn’t represent a simple orbit. At certain times, they seemed to go into reverse, creating a complex jungle of ‘epicycles’ that astronomers struggled to explain. Their mappings of the movement of the planets, as seen from the earth, are shown on the left hand side of Figure 1. An extremely complex picture that is very hard to fathom.

Then came Nicolaus Copernicus. He said the sun was the centre of our universe, and that the earth orbited the sun. A simple switch of perspective — from earth-centric to sun-centric — created the enormously simplified picture of the planet’s movements shown in the right hand side of the figure.

Figure 1. Pre- and Post-Copernican views of the planets’ movements

Today, something similar is happening with data sharing. Back in the days before Copernicus, ‘everyone’ including the great, the good, the clever and the powerful, ‘just knew’ that the earth was the centre of the universe and that everything else, including the sun, revolved around it. Today, ‘everyone’ including powerful actors and decision-makers like the UK Government ‘just know’ that organisations are the centre of the personal data universe, and that everyone else including citizens revolves around these organisations.

That’s why the UK Government is pressing ahead with organisation-centric plans for the future of data sharing. Plans that, if implemented, will create a picture far more complicated than that shown on the left hand side of the illustration; a complexity catastrophe on multiple fronts: costs, data security, interoperability, governance and trust.

The complexity catastrophe

Let’s take a look at how this data sharing complexity catastrophe will unfold.

There are basically two ways to share personal data. The first is an ‘organisation-centric’ one: direct from organisation to organisation, with the data never being handled by the citizen the data relates to.

The second is a person centric one, where organisations deposit copies of details they hold about people in their personal data store, so that the citizen can forward share this data as and when needed.

Figure 2: Different models of data sharing

What is the difference between the two? If you look at Figure 2, which shows how the two models work, you may not think there is much difference. On the left hand side, if there are three organisations involved in data sharing, three connections between them are required: one each to the other two. Simple!

The person-centric approach to data sharing shown on the right hand of the diagram also requires three connections. But why bother adding a completely new entity — the citizen’s personal data store — into the equation? Doesn’t that just add cost and complexity?

Figure 3: What happens as data sharing scales

Now take a look at Figure 3, which shows what happens if eight organisations are now sharing an individual’s data. On the organisation-centric left hand side, the number of connections has grown to 28, whereas if the data is shared via the individual’s personal data store, the number of connections has grown to just eight. Instead of each organisation having to connect with every other organisation involved in data sharing (who they have never done business with before), the person-centric approach only requires one connection per organisation — with the citizen’s personal data store. This connection is with somebody they already have a relationship with: the citizen.

Figure 3 three shows that as the organisation-to-organisation approach to data sharing scales, its complexity grows exponentially. And data sharing between eight organisations is just beginning.

How many Government services currently collect and use personal data? Well, you can get to a list of eight very quickly. How about DVLA (driving licence data), the Passport Authority, the Money and Pensions Service, the Disclosure and Barring Service, the Department of Work and Pensions, HMRC (tax data), the Ministry of Justice (for services such as Lasting Powers of Attorney), the Home Office (residency and citizen status). That’s eight. Without drawing a breath. And without even thinking about all the different parts of the National Health Service, care services, local authorities, and education authorities that also collect and use citizens’ data. And completely ignoring the third and private sectors.

Figure 4: The complexity catastrophe sets in

If we take the number of organisations involved in sharing data about individuals to fifty, the number of connections now needed by the organisation-centric approach rises to 1225. (See Figure 4) Whereas, if the data is shared via the individual’s personal data store, it rises to 50 — still tracking the number of organisations involved.

Which sort of system do we want to create? The organisation-centric approach to data sharing which creates a picture of pre-Copernican complexity or the person-centric approach which creates a picture of post-Copernican simplicity?

Perhaps pre-Copernican complexity would be justifiable if it brought tremendous benefits while the person-centric approach brought tremendous risks and harms. But the opposite is true.

Where the catastrophes lie

The pre-Copernican organisation-centric approach to data sharing doesn’t create just one complexity catastrophe. It creates many, involving costs, security, interoperability, governance and trust.

Costs Let’s assume for the sake of argument that the costs of an organisation sharing data with another organisation and with an individual’s personal data store are the same. (As we show below, this is a bad assumption to make, because the costs of sharing data between organisations is actually much higher.) But keeping to that assumption for a moment, if 50 organisations are sharing data under the pre-Copernican approach, with the exponential rise in connections involved total costs across the system are around 25 times higher than with the person-centric approach. That’s because each organisation has to manage dealings with 49 other organisations instead of having to deal with just one other organisation — the personal data store provider.

Security The databases of organisations like the Department of Work and Pensions and Her Majesty’s Revenue and Customs Service (taxes) were designed to operate like moated and walled data castles, working on the principle of perimeter-based protection. They were designed to keep outsiders out. Rightly so. They were designed to protect the data of the people inside: citizens.

But with data sharing, organisations have to open their systems up — and the greater the number of connections they create, the greater the security risks. Once they are sharing data with 50 or more other organisations, their carefully built castle walls begin to look like a piece of Swiss gouda cheese: full of holes. But when sharing data with a personal data store, they only have to create one carefully managed and scrutinised API connection with the PDS platform.

Interoperability Because of the way our data systems have evolved, every organisation has its own, different software systems, formats, languages, standards and so on. For one organisation to become adept at sharing data with 49 other organisations it needs to become an expert not only at its own data systems but with 49 other organisations’ systems too.

It’s not going to happen. And if it does, it’s going to be extremely costly and time-consuming. That’s because this interoperability problem doesn’t have to be solved just once. Each of the 50 organisations involved in data sharing has to solve it again, for themselves, separately and independently — reinventing the same wheel 50 times over, once again multiplying total system costs many times over.

Whereas, when sharing data with a personal data store, the organisation hardly has to think about software systems, formats, languages and standards at all. That’s because it simply shares the data using the systems it already has. It is then up to the personal data store provider to manage these interoperability challenges. The big benefit here is that the PDS provider only has to solve these problems once for the solution to work with all 50 organisations.

Governance Who decides what data should be shared with who, for what purposes? Under the Government’s current proposals this is left down to ‘senior leaders’ (e.g. civil servants) within Government departments. Will these ‘senior leaders’ ask citizens if they want their data to be shared? If not, what right do they have to make the decision?

With PDS-based data sharing, most requests for data to be shared come directly from individuals seeking to access or use a service. The governance, consents and permissions challenges of data sharing are addressed almost automatically as a by-product of the process.

Trust For all of these reasons, the pre-Copernican approach to the sharing of personal data is highly likely to generate its own knock-on trust catastrophe. But that’s just the beginning. Under the current system, when individuals share data with an organisation, they have at least some idea of what data they are sharing with who, for what purposes (unless something underhanded, devious and illegal is going on).

With the pre-Copernican organisation-to-organisation approach to data sharing, it becomes practically impossible for citizens to keep track of who has access to their data, for what purposes. The ideal of citizens being able to exercise control over their data becomes practically impossible and goes out of the window. Whereas, with a personal data store, citizens are provided with their own consents and permissions dashboard which enables them to see everyone they have shared what data with for what purposes, and can exercise control (changing these consents and permissions) easily, from within this dashboard.

Less than half the picture

The above analysis of pre- and post-Copernican data sharing only addresses operational issues. They are less than half the picture. The organisation-centric approach focuses only on what organisations need to do with personal data they hold. It ignores the challenges facing citizens when applying for, accessing and using services — challenges which personal data stores address by making citizens’ data available to citizens themselves.

It also ignores the immense potential benefits of making citizens the point of integration of data about themselves. What would happen if 50 or more different organisations each deposited data they hold about the citizen in that citizen’s personal data store? A totally new-to-the-world personal data asset would be created: one which starts to generate a complete picture of that individual. These new personal data assets could become the engines of most if not all data-driven innovation of the future — a driver of economic growth.

But if Government persists in its organisation-centric approach to data sharing, this opportunity will never be created because their data will remain dispersed across multiple service providers.

Conclusion

In machine learning, deep assumptions such as ‘the sun orbits the earth’ or ‘organisations are the only entities that collect and use personal data’ are commonly called ‘priors’. A prior is an initial set of beliefs that people bring with them to the experiences and challenges that they face. They are the lenses through which we see the world.

Pre-Copernican astronomers studied the workings of the heavens as closely as those that came after them. But because it never entered their heads that the earth might orbit the sun, every conclusion they drew was wrong. Today’s data strategies and policies are being made by a Government with the wrong, pre-Copernican prior belief that organisations are the centre of the personal data universe. As a result its policies are way off the mark.

Organisations are not the centre of the personal data universe. Citizens are. If Government put citizens at the centre of its data policies, the simplicity and power of the post-Copernican perspective would shine through everything it did.

--

--