Governing the data layer

Published in

Predict

13 min readApr 5, 2022

A few reflections from a gathering I attended last week at Ditchley Park. The topic was how we govern data in democracies. This is all unattributed but I can say with confidence that all the most interesting points weren’t made by me.[1] i.e. this post is more a curation of other people’s ideas than a fully worked out essay.

Historical analogies (and dissimilarities)

At the start of the session I was asked to share some reflections to tee up the discussion. I focused on the analogy that runs through my book, End State, between the way we should govern the transition to a digital economy and the way we rebuilt the state in response to the Industrial Revolution.

I started with a similarity between the two time periods: it’s now clear that the information revolution is a discontinuity in economic history in a similar fashion to the Industrial Revolution. It marks a qualitative shift in the way our economy and society function, I.e. The logic of how we live and work, and the way we create value and social meaning, is changing.

A lot of this qualitative shift is explained by the role data now plays in our economy. Since data has unique properties — and properties that are quite unlike the properties of tangible goods and services, which were the base material of previous economic eras — data acts as a kind of vector through which a new economic and social logic spreads. One example is the way big tech companies secure monopoly positions in large part because they have the best data, and because data generates more data in a self-amplifying loop. This means that, in the absence of an appropriate regulatory regime — i.e. one designed for a digital world — data weakens competition, changing how markets work.

When it comes to public policy, this concept of a qualitative break is useful to keep in mind. It means our institutional and regulatory mechanisms are being left behind at the level of their logic. And this means we can’t just tweak the policies we have; we need to build new institutional settlements and wholly new domains of public policy. Examples include new ways to regulate competition, a new policy architecture to collaboratively build/regulate digital infrastructure, and today’s topic: a new legal and institutional regime for data.

There is also a big difference between the digital revolution and the industrial revolution. This is the obvious but easily under-appreciated point that the core material of the digital revolution is intangible, while the core material of the industrial revolution was tangible. This changes the nature of the public policy response.

The industrial revolution was a revolution you could touch. It was a revolution of iron and steam and it’s not a coincidence that the fallout was tangible too; you could feel and smell the social implications. Rivers thickened with sewage, children lost limbs in factories, and railway tracks didn’t line up. So while we faced big problems, we could at least get our hands on them. We built sewers, banned kids from factories, and realigned railway tracks so that they all ran to a standard gauge.

Today we’re navigating an intangible revolution and notice how the social fallout is intangible as a result. Lots of today’s biggest social problems — including the ones that stem from the central role of data/information in today’s economy — are hard to put your finger on. We live in a cloud of unease, mental ill-health, burnout, populism, and concerns about privacy, trust, and truth.

This is the thing I find most daunting about governing a digital economy. It’s not just that the logic of our policy settlement needs to change; it’s that the state also needs to learn to work with unfamiliar materials. And intangible materials aren’t ones that democratic states find it comfortable to work with.

What do we do when society is polluted by toxic ideas online? What lever do you pull to build trust? How do you even measure the performance of the economy in an intangible world, when concepts like GDP and inflation miss so much?[2]

I call the big, intangible social challenges we’re facing today ‘soft problems’ as opposed to ‘hard problems’, by which I don’t mean they’re easier to solve, or less important, but just that we can’t get our hands on them. They’re often more about culture and behaviour or concepts like trust than they are about physical things in the world.

So when it comes to how we govern data, I find it helpful to keep these two lessons from history in mind:

Incrementalism won’t be enough. Our new policy settlement will need to be formally unrecognisable from the old one; if it doesn’t function to a new logic, it’s unlikely to be the answer.
We need to work with new materials, and this means we’ll need to be imaginative. How does public policy solve soft problems? The answer will feel different to the answers we grew used to as policymakers in the 20th century.

Data processing centres, a physical manifestation of an intangible economy

How should we govern data infrastructure?

I couldn’t stay for the second day’s discussion, only a session on the first day exploring how we govern data infrastructure. Even this conversation was incredibly wide-ranging, so rather than trying to distil one argument I’ll just call out some interesting points.

Describing infrastructure. If we want to know how to govern data/digital infrastructure, we need to know what we mean by that term. What does data/digital infrastructure look like? And how is it different from/similar to physical infrastructure? A useful starting point might be to describe the layers of the digital stack. One description could be:

A base layer of protocols and transport
A platforms and cloud layer
A data layer
An app layer of consumer-facing applications

(TBC whether or not this last layer should really count as infrastructure.) These layers might then merit different conceptual and regulatory treatments.

Could this layered classification act as a framework for the way we regulate digital/data infrastructure? And could we also complement these layers with a vertical taxonomy within each layer? For example, could/should we distinguish between different classes of data in the data layer? Health data, for example, might justify its own regulatory approach. Likewise, we might want to describe different types of platform and treat them differently, like social networks vs. digital marketplaces.

This all raises a challenge: the layers of the digital/data stack blur together, arguably much more so than the layers of physical infrastructure. Lots of tech companies are vertically integrated, i.e. their whole offer is that they let consumers access the power of the stack by running integrated services. They might run an app as a user interface so that people can interact with a platform that draws its power from the data layer. This complicates the task of governing digital/data infrastructure.

How different is this to physical infrastructure? Railways might be an interesting contrast to think about. It’s at least technically possible to separate out the tracks from the trains in the way we regulate railways. Likewise with electricity; many countries, including the UK, have a different regulatory regime for electricity networks (the underlying platform, which in the UK is run by regulated monopolies) versus the way we regulate customer-facing energy retailers (the equivalent to the app layer, which in the UK is a competitive market (or at least it was, until it fell apart)).

I’m not saying these models work well. The point is just that it’s conceptually and practically possible to distinguish the layers of physical infrastructure in a relatively stable way. It feels instinctively like this would be much harder with digital infrastructure. (Or would it? Maybe that’s an instinct to stress-test.)[3]

How should we think about ownership in the context of data? This question triggered a fascinating debate. The point that stuck with me was this one: maybe we shouldn’t think about ownership at all when it comes to data. i.e. Maybe ownership (of data) is an unhelpful and outmoded concept, and one that obfuscates more than it clarifies.

Here are two examples of ‘ownership’ being an unhelpful concept when it comes to data:

If I have your phone number, do I own it? i.e. Can I legitimately give it away? At the moment, I legally can, in that I can give a tech company access to ‘my’ contacts, including ‘your’ phone number. Am I handing them ownership? Am I ‘giving’ them ‘your’ number?
If I legally ‘own’ my banking data but in practice I can’t port the data to another platform, or if porting is technically possible but really difficult in practice, in what sense do I own this data? What does it mean to ‘own’ something if you don’t have sovereignty over the thing that is owned?

What this suggests is that, when it comes to data, ownership is neither necessary for the things we care about (e.g. you don’t need to ‘own’ data in order to access it) and nor is it sufficient (because without portability, for example, what does it matter that you ‘own’ the data?)

What use, then, is ownership? Maybe it’s a red herring. A legacy concept inherited from a pre-digital world and one that, in a digital world, only distracts and confuses. And maybe this also means that the conceptual frameworks that are associated with ownership, like public goods vs. private goods, or legal/regulatory models derived from property rights, are unhelpful too.[4]

Where would this leave us? What should we talk about if we shouldn’t talk about ownership? Maybe we should be more specific. I.e. Maybe we should talk about the things we want to do with data. Maybe we should use words like access, steward, maintain, modify, or phrases like ‘facilitate access to’. And maybe this would clarify an often confusing debate. So institutions would not ‘sell data’, they would sell ‘access to data’. And they would not ‘own data’, they would ‘steward data’ or ‘maintain data’.

How do we deliver radical reform? The deeper you get into these debates, the more obvious it is that we need a whole new settlement for regulating data and digital infrastructure. It’s also clear that the new settlement doesn’t just need to be different, it also needs to be worked through with real care and sophistication. If it’s not, there’s a genuine risk that we lose the progress we made in building the current settlement. We might make things worse.

Stepping back, the task we face is to move, fairly quickly, from one sophisticated and mutually complementary set of institutional arrangements, to a whole new sophisticated and mutually complementary set of institutional arrangements. Which reminds me of that old metaphor: how do you rebuild a boat while you’re sailing in it? Except here it feels more like we’re trying to transform an oil tanker into a cruise ship as we sail it through the Suez canal.

So how do we do that? I don’t really know, but here are four pertinent points that came up in the discussion.

First, technocratic thinking won’t be enough. If we want to transition to a new regulatory settlement; we need to bring in ethics and questions of political economy. A systems-level transition won’t submit to incremental/marginalist reasoning. Some of the questions we face with respect to data are irreducibly questions of political economy, or even of ethics. In other words, they’re political.

This again reminds me of the Industrial Revolution, when lots of the biggest policy debates of the day ultimately came down to ethical arguments about the type of society and economy we wanted to build. From child labour to free healthcare and education, these were ethical choices, not economic ones. Some of the big questions we face today in relation to data feel similar. For example, does data belong in the commons?

Second, how do we stop complexity from paralysing progress? When a topic is complicated like data infrastructure, policymakers often want the reassurance of a blueprint. Sadly, they can’t have one; we’re going to have to learn as we go. And this brings us to another metaphor: what we need is a compass, not a map. I.e. we need to be confident about our direction of travel but we don’t need a precise destination or a perfectly worked out route in order to get going.

This speaks to a third point: the role of government is to lead, not just to legislate and implement a new settlement. At historic transitions like this, an important role for public intellectuals and policy/political leaders is to point out where we’re headed, or to sketch the contours of a new settlement. Again, my mind jumps to a historical analogy: when we created the two day weekend in the late 19th and early 20th centuries, and when we later introduced paid holidays, lots of progress was made without needing to mandate the new practices through legislation. What was needed was a combination of leadership from the state and changing norms among business and civil society, along with leadership from forward-thinking companies and pressure from trades unions. This led ultimately to a change in social norms about working time, some of which was then codified in the law. So can the state do more to show leadership on data and digital infrastructure, even if we’re not yet ready to legislate?

This takes us to a fourth point: is there a representation gap when it comes to data? When you look at the history of social progress, representatives have played an important role, most obviously Trades Unions as representatives of working people. Unions serve a function that is vital to social change; they aggregate and amplify people’s voices and they also synthesise an inchoate set of views into policy asks.

Who represents data users? Sure, there are charities that work on privacy, but do they represent data users in the round? These questions feel particularly tricky with data because it’s such a complicated and low salience issue. Speaking bluntly, most people don’t care about data. So how do you build pressure for radical reform, or establish a democratic mandate for radical reform, on an issue that no-one really understands or cares about? When we’re talking about incremental changes, it feels OK to delegate policy decisions to technocratic regulators. But when we’re making big political judgments about the future of our society and economy, this feels uncomfortable. So do we need new institutions to fill the data representation gap?

Who pays? We also talked about money. Or rather: how do we reward investment in data, and innovation with data, and especially the types of innovation and investment that add the most value?

There are lots of ways to govern investment, including of course a free and competitive market, which in itself is an institutional settlement. No doubt the answer will be a mix of models. Maybe the data stores that have the most significant public value (e.g. health data) should end up being stewarded centrally by arms length public bodies, funded by charging people for use/access. Maybe some deep capabilities like identity management should be taxpayer-funded, for the same reason we publicly fund some physical infrastructure like sewers. Maybe some parts of the data/digital infrastructure should become regulated monopolies, like electricity networks in the UK, so that companies are allowed to make a certain return while agreeing to innovate and invest. Maybe other parts of the infrastructure should be developed and owned collaboratively, using new models of collaborative ownership with associated decentralised reward structures. And of course no doubt the majority of data/digital infrastructure will remain regulated through the standard institutions of a free and open market, supported by an active competition regime.

Finally, what about Web 3.0? After all, most of the points above speak to the question: how do we govern Web 2.0? The discussion is mostly premised on a world of big, centrally-owned platforms. So is this all just another example of policymakers being slow to the party? Just as we get serious about governing Web 2.0, its successor, the decentralised model of Web 3.0, is already arriving.

For what it’s worth, I don’t think this last point is much of a concern. Who knows if Web 3.0 will change the world and, even if it does, the basic dynamics of Web 2.0 still have a long way to run (i.e. decades). I guess the useful pointis just that policymakers should stay alert to the potential inherent in Web 3.0 in case it proves useful. Maybe as these technologies and practices take hold there will be ideas and capabilities we can draw on for the wider regulatory response to Web 2.0; a good example might be how we incentivise decentralised groups of people to contribute to shared infrastructure.

As I said at the start of this post, this is really just a list of issues to explore, and this is just my (no doubt biassed) gloss on a group discussion. There was also a longer debate on day two that I was sorry to miss. In general, though, I still left with a sense of optimism. The discussion had a great mix of deep expertise and openness to big ideas, which I always think is precisely what we need when grappling with the implications of digital; deep insight and open minds. And at a push, I’d even say the rough outline of a new settlement for data and digital infrastructure is starting to emerge, even if it’s still pretty blurry.

This post is part of a year-long series exploring how we govern the future. To read along, you can follow me on Medium here or support the project for £3 a month on Substack here. For the big story behind all this, from Victorian sewers to digital dragons, you can buy my book End State.

Footnotes

The discussion was ‘Ditchley rules’ (basically Chatham House), which is why I’ve not attributed any of the comments.
If you’re interested in the rise of intangibles, Capitalism without Capital by Jonathan Haskel and Stian Westlake is excellent on the economic aspects of this challenge.
In the discussion I was in, we didn’t get too deep into the regulatory models themselves, but you can see how it might be possible to apply different regulatory approaches to different layers of the stack. For example, maybe in the case of the base layer, the state’s role is to facilitate the agreement of protocols and standards, and sometimes to empower arms-length bodies to enforce standards. In the platform layer, maybe some of the most strategically significant platforms will end up being regulated as public utilities, emulating the model used historically for the platforms that underpin energy or telecoms, e.g. by applying common carrier or non-discrimination requirements and even in some cases price settlements to stop exploitative pricing. Maybe in rare cases platforms will be built and run by the state; an example might be identity verification. All of this is of course fraught with difficulty, in particular because (a) the layers are so intertwined and (b) the whole thing is so complicated and fast-changing. And also because, unlike with physical infrastructure, there’s so much value created by ongoing investment that the last thing we want to do is stifle innovation.
This links to another question: is data ‘personal’? Lots of data clearly is, and with advanced machine learning techniques it’s increasingly hard to guarantee anonymity even in aggregated data. In a sense, the personal/public tension runs to the heart of data: at an individual level, data feels private and even deeply personal; at an aggregate level the same data becomes a source of huge social and economic value, so much so that it feels like it must surely belong in the commons. I suspect a lot of the difficulty of regulating data comes down to this split personality; how should we treat a substance that occupies twin states?

Governing the data layer

Historical analogies (and dissimilarities)

How should we govern data infrastructure?

Footnotes

Written by James Plunkett