Patterns of data institution that support people to steward data themselves, or become more involved in data stewardship

Jack Hardinges
7 min readFeb 14, 2020

--

Data stewardship involves collecting, maintaining and sharing data, and in particular, determining who has access to it, for what purpose and to whose benefit.

The activities that derive value and impact from data

The way that data is stewarded is important, as it affects what it can be used for and how we make decisions.

Recent years have seen efforts to empower people to steward data themselves, or become more involved in the stewardship of data about them. Sometimes doing so will involve data institutions. Just as an institution is an organisation devoted to a particular cause, especially of a public, educational or charitable character, a data institution is an entity whose cause is primarily the stewardship of data in this vein.

While considering how existing concepts of governance — such as cooperatives and commons — can be applied to data is important, I was inspired by conversations with Mad back at MyData 2019 to think more inductively. In other words, what patterns of data institution can we already see emerging?

In this post I want to share the patterns of data institution that I’ve come across that interact with people and their rights over data. What individual examples are called — ‘data trusts’, ‘data cooperatives’, ‘data clubs’, ‘data commons’, etc — is less important here than how they actually steward data, and I think the term ‘patterns’ more accurately describes the fact they are observations of recurring behaviours rather than concrete types.

By no means exhaustive, these are the patterns I’ve seen and some examples of them:

  1. it enables people to contribute data about them to it and, on a case-by-case basis, people can choose to permit third parties to access that data. This is the pattern that many personal data stores and personal data management systems adopt in holding data and enabling users to unlock new apps and services that can plug into it. Health Bank enables people to upload their medical records and other information like wearable readings and scans to share with doctors or ‘loved ones’ to help manage their care; Japan’s accredited information banks might undertake a similar role. Other examples — such as Savvy and Datacoup — seem to be focused on sharing data with market research companies willing to offer a form of payment. Some digital identity services may also conform to this pattern.
  2. it enables people to contribute data about them to it and, on a case-by-case basis, people can choose whether that data is shared with third parties as part of aggregate datasets. OpenHumans is an example that enables communities of people to share data for group studies and other activities. Owners of a MIDATA account can “actively contribute to medical research and clinical studies by granting selective access to their personal data”. The approach put forward by the European DECODE project would seem to support this type of individual buy-in to collective data sharing, in that case with a civic purpose. The concept of data unions advocated by Streamr seeks to create financial value for individuals by creating aggregate collections of data in this way. Although Salus Coop asks its users to “share and govern [their] data together.. to put it at the service of collective return”, it looks as though individuals can choose which uses to put it to.
  3. it enables people to contribute data about them to it and decisions about what third parties can access aggregate datasets are taken collectively. As an example, The Good Data seeks to sell browsing data generated by its users “entirely on their members’ terms… [where] any member can participate in deciding these rules”. The members of the Holland Health Data Cooperative would similarly appear to “determine what happens to their data” collectively, as would drivers and other workers who contribute data about them to Workers Info Exchange.
  4. it enables people to contribute data about them and defer authority to it to decide who can access the data. A high-profile proposal of this pattern comes in the form of ‘bottom-up data trusts’ — Mozilla Fellow Anouk Ruhaak has described scenarios where multiple people “hand over their data assets or data rights to a trustee”. Some personal data stores and personal information management systems will also operate under this kind of delegated authority within particular parameters or settings.
  5. people entrust it to mediate their relationships with services that collect data about them. This is more related to decisions about data collection rather than decisions about access to existing data, but involves the stewardship of data nonetheless. For example, Tom Steinberg has described a scenario whereby “you would nominate a Personal Data Representative to make choices for you about which apps can do what with your data.. [it] could be a big internet company, it could be a church, it could be a trade union, or it could be a dedicated rights group like the Electronic Frontier Foundation”. Companies like Disconnect.Me and Jumbo are newer examples of this type of approach in practice.
  6. it enables people to collect or create new data. Again, this pattern describes the collection rather than the re-use of existing data. For example, OpenBenches enables volunteers to contribute information about memorial benches, and OpenStreetMap does similar at much larger scale to collaboratively create and maintain a free map of the world. The ODI has published research into well-known collaboratively maintained datasets, including Wikidata, Wikipedia and MusicBrainz, and a library of related design patterns. I’ve included this pattern here as to me it represents a way for people to be directly involved in the stewardship of data, personal or not.
  7. it collects data in providing a service to users and, on a case-by-case basis, users can share that data directly with third parties. This pattern enables users to unlock new services by sharing data about them (such as via Open Banking and other initiatives labelled as ‘data portability’), or to donate data for broader notions of good (such as Strava’s settings that enable its users to contribute data about them to aggregate datasets shared with cities for planning). I like IF’s catalogue of approaches for enabling people to permit access to data in this way, and its work to show how services can design for the fact that data is often about multiple people.
  8. it collects data by providing a service to users and shares that data directly with third parties as provisioned for in its Terms and Conditions. This typically happens when we agree to Ts&Cs that allow data about us to be shared with third parties of an organisation’s choice, such as for advertising, and so might be considered a ‘dark’ pattern. However, some data collectors are beginning to do this for more public, educational or charitable purposes — such as Uber’s sharing of aggregations of data with cities via the SharedStreets initiative. Although the only real involvement we have here in stewarding data is in choosing to use the service, might we not begin to choose between services, in part, based on how well they act as data institutions?

These patterns focus on the way that data institutions that interact with people and their rights over data are emerging. It doesn’t address actors that may play other important roles in the data ecosystem, such as legislating for or enforcing rights or developing standards and protocols. I’m looking at these examples in the context of data stewardship — and in particular, how decisions about access to data are made — rather than analysing the ownership structures of the underlying legal entities, their business models or the technologies used to make them work.

I echo the point that Nesta recently made in their paper on ‘citizen-led data governance’, that “while it can be useful to assign labels to different approaches, in reality no clear-cut boundary exists between each of the models, and many of the models may overlap”. Although I’ve tried to avoid the popular names, terms and labels typically used to create categories, this collection of patterns come with its own biases and inaccuracies. Phrases that read ‘contribute data’ should often be read as ‘contribute data or rights or control over it’, and references to ‘data about them’ may make more sense to be read just as ‘data’. The rights that people have over data and why they have those rights is a more complex topic than this post can address.

Another caveat to add is that many of the examples referenced in this post seek to empower people to steward data themselves, or at least play a more active role in data stewardship, for different purposes. They will certainly not meet the needs of all people, they will likely require different enabling conditions or activities to work, and in some cases those purposes may be addressed more effectively through other means (such as by enforcing existing laws more effectively or building data skills and literacy). To use an example, I don’t think bottom-up data trusts alone will “rebalance the respective control that corporations and individuals have over personal data”.

This is a map of a messy space, so it will be limited. Writing down the patterns, though, has helped me to understand it better than I otherwise would, especially in cutting through the marketing and hype around (what are often presented as new) services or approaches. I’m sharing them in case they help others to do the same — let me know what I’ve missed.

--

--

Jack Hardinges

Policy Advisor at the Open Data Institute. Interested in open, the economics of data, platforms, AI, ethics, data portability, etc, etc. @jhardinges on Twitter.