Data Commons & Data Trusts

What they are and how they relate

10 min readMay 15, 2020

How do data trusts relate to data commons? When trying to make sense of the many data governance models and data sharing arrangements in existence this question comes up a lot. Here I give some basic insights into how I tend to think about the relationship between a data commons and a data trust.

Common Pool Resources and their management

To understand a data commons, we need to first understand what defines a common good, or, more accurately, a common pool resource (CPR).

A CPR has two key characteristics: its use is rivalrous and it’s hard to exclude others from using the resource. What this means is that if I use the good, it reduces the ability of others to also use the good and it’s hard for me to physically keep you from using the good. For example, rivers are generally considered CPRs. It’s hard to keep anyone from using the river. In addition, if one fisher uses the river to get fish, another fisher may no longer be able to do the same. Or, if a factory pollutes the river, others may not be able to swim in it.

Want to read this story later? Save it in Journal.

Figure 1: Types of goods

What’s considered a common pool resource can change over time. With new technologies we gain the ability to exclude others from resources that were previously non-excludable. The ability to build a fence around a piece of land, for instance, may turn that resource from a common pool resource into a private good (i.e. a good that is excludable and rivalrous).

Why is this definition of common pool resources important? Because of these specific characteristics of a CPR there is a fear that if everyone just uses the resource as they see fit, it will quickly be overused and exhausted. This is known as the tragedy of the commons. However, as observed by Elenor Ostrom, such exhaustion does not need to happen and neither is it necessary for a state to step in and decide who gets to use it for what, or for a single company to control all the property rights. Instead, her research found that communities often find ways to decide on access to and use of the resource between themselves. These are commonly referred to as commons.

Here’s an example. To avoid overfishing, governments regularly impose top-down quotas, often combined with a marketplace of sorts in which quotas can be traded. Not so in Nova Scotia, where a communities of fishers collectively manage their fish stock, without the need for government involvement. Part of their success is explained by the strong social ties between the fishers and their ability to clearly distinguish between insiders (those who have fishing rights) and outsiders to the commons.

As I hope is clear by now, resources stewarded by a commons are not free for anyone to use as they please. They are therefore distinct from what is colloquially referred to as a ‘common good’. In fact, the ability to draw a clear boundary around the community that has access to a resource determines the success of this governance model. In Ostrom’s words, well-functioning commons require a “clear definition of the contents of the common pool resource and effective exclusion of external un-entitled parties.”

How does this relate to data?

Data is not a common pool resource. It’s easy to exclude others from using particular datasets and one person using a piece of data does not exclude others from doing the same. The combination of these characteristics is usually referred to as a club good. So why would we need a data commons?

First of all, things don’t have to be common pool resources to be stewarded by a commons. We could easily imagine the collective stewarding of a car. But there’s a larger point to be made here. While data may be a club good, privacy — or the control over the appropriate flow of information — could be thought of as a CPR. Privacy is rivalrous: if I share information about you, you no longer have a chance to decide how to share that information about yourself. What is more, even when I share information about myself, that information may be used to infer things about people like me. Those people would no longer have control over whether or not they want that information shared. In a way, privacy is also hard to exclude. It’s hard for me to stop you from sharing specific information.

Privacy is usually meant to cover information flows where the information describes people, but we can extend this to include what is commonly referred to as non-personal data. One of the problems with the open data movement is that it does not account for the fact that open data benefits different entities in different ways and once the data is open to all, we have little control over who will be able to make most use of it.

Take agricultural data, for instance. This data describes not people, but agricultural processes, cows, or crops. Those most impacted by the use of the data are farmers. Presumably, they would want the data to be used to understand what to grow and when, how much pesticides or fertilisers to use for what crop etc. In order for them to achieve this they need to share the data with trusted entities that can help them place it in the right context. However, by making it openly available, they risk hedge funds with deep pockets and lots of computing power using the data to manipulate markets. Alternatively, the companies they sell their crops to may use the data to set prices. Consequently, farmers collectively may want to have a say about who can and cannot access farm data. A similar argument can be made about other non-personal data sources such as ‘smart’ city data.

Here again, we could have a state step in and decide what data can be accessed and used by whom and for what purpose. And in some cases that is exactly what needs to happen. But as with the natural resource commons, there is an opportunity here for communities to come together and navigate this on their own. In fact, a lot of this is already happening.

One example of a data commons is the UK Biobank, which brings together medical data on 500,000 people, who consent to this data being used by researchers ‘to improve the prevention, diagnosis and treatment of a wide range of serious and life-threatening illnesses’. Access to the data is available to any ‘bonafide researcher’ upon request. The Biobank has strong safeguards in place to ensure that the data does not fall in the wrong hands and is not used for purposes not consented to by the individual data donators.

Why data trusts?

Imagine a group of people living and making use of a plot of land. For decades, they collectively decide how the land is to be used, in a way that is generally deemed appropriate and fair by all members. Their practices have thus far ensured that the ground they all rely on for their food production is healthy and fertile. Now, an external actor — be it a corporation or a state — comes in. They want control over the land to grow avocados. What is to stop the corporation from grabbing the land? For instance, if property rights are controlled by members of the community, what is to stop them from selling to the highest bidder, thereby effectively ending the commons for everyone? What if no one holds property rights to the land?

One solution to this problem came in the form of Community Land Trusts. In a CLT, all the titles to the land are transferred to a board of trustees, who will steward the land for a specific purpose (e.g. to provide affordable housing). The trustees have a fiduciary duty to look after the sole interest of the beneficiaries, made up of residents and the wider community. Trustees cannot go against the interest of the beneficiary or the specific purpose of the trust. It’s therefore not possible for the trust to sell off the land titles under its control to the highest bidder. As such, CLTs provide a legal protection to the continued existence of the commons and provide a way for the commons to be legally recognised by a higher authority.

Data trusts provide similar guarantees but for data, or data rights. In a data trust, the community places their data or data rights under the control of a trustee, or board of trustees. Similarly, the trustees have a fiduciary duty to look after the sole interest of the beneficiaries, which range from data subjects to those that need protection from data being abused. Data trusts can have many different purposes. Some might exist to make data available to academic researchers trying to cure cancer, others may ensure agricultural data is used for sustainable farming.

With a data trust in place, we have a legal mechanism to ensure that the objectives set out by a data commons are guaranteed over time. Data held by such a commons cannot be sold off to the highest bidder if that violates the purpose and the interest of the beneficiaries. Data trusts, therefore, should be seen as a legal relationship that allows for the protection of a data commons, rather than as a governance model that is distinct from a data commons. They may not always be needed or relevant, but they are a useful instrument to have in our back-pocket!

When are data trusts appropriate?

Data trusts can be a great way to safeguard the privacy of one or many, but that does not mean they are always the right tool. In order for data trusts to be relevant we need to have a ‘thing’ (an asset or a right) that we can hand over to a board of trustees. When it comes to data that ‘thing’ is usually a right we have over the data.

Various data protection laws around the world grant individuals rights over their data, such as the right to decide what data is shared and for which purpose. In addition, some data may be subject to intellectual property rights. As explained by Sylvie Delacroix and Neil Lawrence, those rights make data trusts possible: when we have rights over data we can hand over those rights to data trustees, to be held in a data trust.

However, in many other cases, data does not have any rights attached to it. This is true, for instance, for a lot of the agricultural data described above. What do we do in those cases?

We might look at Open Source Software for answers. OSS often involves a community of software developers diligently contributing to a piece of software that is open to anyone and that can be copied and changed by anyone (with certain restrictions, depending on the license). Many of the rights one could hold over the software have essentially been released. But, what if this project was initially created by an employee, whose company — while releasing the software as open source — holds rights over the branding (the name, the logo etc.). This may seem trivial, but it isn’t. The brand is the thing that’s known by users of the software as well as those dedicating their spare time to maintaining it. As a result, those who hold the brand often enjoy greater control over the direction the project takes. So how we keep this corporation (that may have its own agenda) from controlling the project? The community of developers could of course decide to jump ship, copy the project and start over under a different name. This, however, takes time and resources. Instead, to avoid capture of the branding, many projects elect to have these assets held by a foundation (or a trust) that is established for the purpose of keeping the project under communal control.

We could imagine something similar for data, where rather than the data rights forming the asset under trust, it is the brand of the commons itself that is controlled by the trust, perhaps alongside other non-data assets that are relevant to the functioning of the data commons. It would allow the commons to establish and preserve a reputation for ‘good governance’ (essentially building up social credit) and allow others to make their data available to it. Of course, this is an imperfect solution as it does not legally protect the thing we care about most: the data.

Concluding remarks

One, I have here approached data trusts as a way to preserve the aims and ambitions set by data commons. This perspective differs from the one where we set up a data trust to act as intermediaries with the aim to preserve the privacy rights of the individual. Both perspectives are valuable, but may equally result in slightly different approaches to governance under a data trust.

Secondly, we might ask when something can be called a data trust. Is a data trust, by definition a trust that holds data rights? Or do we also include more hybrid forms?