Strata SE1 in 2012, with its seldom-used wind turbines (Colin Smith)

What Aristotle and architecture can teach us about sharing data; or, how to build a data institution

Jared Robert Keller
Canvas
Published in
11 min readOct 27, 2023

--

Building things is hard. Especially when the thing you’re trying to build is new, innovative or relatively untested. Even if you successfully build it and hold a ribbon cutting, what you’ve built might not stand the test of time, might not be fit for purpose or might have unforeseen limitations and side effects.

Case in point, the galloping bridges of the twentieth (and twenty-first!) century or the innovative curved skyscrapers that have melted cars and burnt poolside holiday seekers.

In my research examining data sharing models, I’ve found the same holds true when trying to build new or innovative ways of sharing data. The last decade is full of examples of open data portals that have sat relatively unused and data marketplaces that have floundered.

The good news is that many other people have thought deeply about this challenge and have offered helpful guidance on how to deconstruct and make sense of things — be they buildings or novel data sharing models.

In this piece I draw on a bit of Aristotelian metaphysics to outline what I have found to be a useful framework and metaphor for conceptualising different models for sharing data — including what they’re composed of, how they’re structured, how they function to achieve a goal and how they’re built. I argue that when designing a data sharing model, people should start by defining the function before settling on anything else.

What does Aristotelian metaphysics have to do with architecture and data sharing?

At the ODI, a lot of my work touches in one way or another on something we call data institutions, which are essentially just organisations that steward data on behalf of others, often towards public, educational or charitable aims. (Others might use different language to describe these entities, such as ‘data intermediaries’ or ‘data initiatives’.) We see these organisations as playing a range of roles within their ecosystems, but in the end their main function is to help data get safely and responsibly from the people who generate it, collect it or hold it, to the people who want to use it in order to develop services, conduct research or make decisions.

Because the concept of data institutions is fairly new, it can be difficult to conceptualise and understand what they are, how they work, and why they’re important. This makes them hard to design and build because you must understand something in order to build it. For the same reasons, it makes it hard to describe their capabilities and importance to people in sectors or ecosystems that might benefit from them.

Thankfully, Aristotle also spent a good deal of time trying to understand how things work. Back in the 4th century BCE, when Aristotle wanted to understand something — be it a table, river or planet — he would strive to understand it through four ‘causes’: its material, form, function and creation. (I have adapted the translations of the four causes for the sake of simplicity and, hopefully, clarity.) According to him, an explanation which failed to invoke all four causes was hardly an explanation at all. The word ‘causes’ can cause a few misunderstandings; I tend to think of it as meaning different ‘lenses’ with which we can view something, or different ways of ‘describing’.

Moving to architecture, when we want to understand what makes a building a building, we can describe the materials that it’s made of, such as wood, stone, or mushrooms; we can describe its form, such as vaulted ceilings, stone pillars, or basket handles; we can describe its function, such as providing a place for people to sleep, a place to store grain, or a place to spoil your dog; and we can describe what went into its creation, including the work of people like architects, developers and corrupt city planners, as well as tools like hammers, saws and board stretchers.

A simple diagram explaining Aristotle’s four causes

Looking at the four causes in this way can help us understand the various components of a building. It can also help us understand and communicate how each of the four causes influences the others and why making suitable choices within each area is so important when constructing a building — or a data institution. A poor choice or mistake within one of these four areas could keep the building from achieving the goals of the builders.

  • Materials: It is possible to construct a building out of the wrong or unsuitable materials. Farnsworth house, designed by renowned architect Mies van der Rohe, has been described as a masterpiece of modern architecture. However, because the majority of the walls are constructed of glass, the residents soon deemed the structure unliveable due to the difficulty of maintaining temperature control and a (reportedly unforeseen) lack of privacy.
  • Form: It is possible to construct a building from suitable materials but in an unsuitable form. In December 2009, the Vdara hotel became the first tower opened as part of Las Vegas’ much touted CityCenter development of the Strip. The following summer, however, poolside holiday seekers began complaining of severe burns due to the building’s concave shape reflecting and focusing sunlight directly onto the pool area.
  • Function: It is possible to construct a building of suitable materials, in a suitable form, but that ultimately does not function in the intended way, has unforeseen knock-on effects, or does not address the identified problem or need. When the building Strata SE1 in London (pictured above) was designed and built, the intention was that the three wind turbines installed on the roof would generate 8% of the building’s electricity needs. But when residents moved in, they complained that the turbines were too loud. As a result, the turbines are now seldom used and produce nowhere near the promised output of electricity.
The Ryugyong Hotel in 2009
  • Creation: It is possible to have plans to construct a building with appropriate materials, in a suitable form that is capable of functioning in a way that will address an identified problem or need, but nonetheless still lack the necessary knowledge, skills, funding, tools or desire to actually bring about its creation. Although construction started in 1987, the Ryugyong Hotel in Pyongyang, North Korea has never opened for business, nor hosted a guest — due in large part to an economic downturn and subsequent lack of funding following the collapse of the Soviet Union. The Ryugyong still has quite a way to go, however, if it wants to unseat the current title holder for longest, most tortured construction project, Barcelona’s Sagrada Familia.

But how does this help me make sense of data institutions?

After applying Aristotle’s four causes to something tangible like buildings, we can begin to apply the framework and way of thinking to something less tangible like data institutions:

Okay, but how does this help me design a data institution?

Deconstructing data institutions in order to make sense of them is one thing, but where I’ve found Aristotle’s thinking to be really helpful, is in understanding how to go about actually designing and building a data institution.

While Aristotle felt that all four causes are important when trying to understand something, he believed that understanding its function ultimately took precedence over the others.

I believe this is also true for data institutions, especially when trying to design or build one. When working with organisations and sector initiatives to design and build data institutions, my colleagues and I often counsel that the first thing to do should be agreeing on the function(s) of the data institution — in other words, the role(s) it will play in helping data get safely and responsibly from the people who collect it or hold it, to the people who want to use it, and what they’ll use it for. If you don’t know which important or valuable datasets exist within your ecosystem, who holds them and who wants to use them in order to develop services, conduct research or make decisions, then you’ll have a hard time deciding on the proper material or form of the data institution and how it can be built.

One way of deconstructing data sharing modelsThe five stages of the data-use journey and the four layers of a data institution

For this reason, when we speak to people who are exploring whether a data institution can help them generate value or solve a challenge, we advise them to start by identifying target datasets and how they might be used to meet their needs, then working from there to identify what role(s) a data institution could play in order to support that use case. We caution against the temptation of starting by building the infrastructure to connect datasets and/or pool them in a central location with the expectation that you’ll figure out who will want to use that data somewhere down the line. Doing so, you run the risk of repeating the mistakes of developers who build fancy ‘vertical cities’ filled with luxury flats only to see them sit empty for years due to the lack of an actual market. We also caution against starting by identifying fancy new technologies, new institutional forms or new governance mechanisms that you think will solve your data sharing challenges. This type of myopic, solutionist thinking can lead to buildings that can’t turn on their wind turbines and data institutions that don’t actually meet the needs of their ecosystems.

But once the use case has been identified and the relevant parties have agreed on the function or role that the data institution will play in order to support that use case, it’s possible to start thinking about Aristotle’s other three ‘causes’ and how these will support or enable the data institution to play the agreed role(s).

  • Form: The design of the data institution should enable it to perform its intended role or function. So, if your goal is to empower users of a service to play a more active part in how data about them is accessed, used or shared, you will want to adopt a form that best supports that goal, such as a data cooperative, data trust or data commons. If your goal is to generate revenue from providing access to data, then you will want to explore forms like data marketplaces.
  • Materials: The building blocks or materials of the data institution should support the desired form, so that it can play its intended role. If your goal is to publish data openly so that anyone in your ecosystem can access, use or share that data, it will be important to choose the right open standards and licences to help people use the data you publish. To help people access the data, you might want to look into open-source open data portals such as CKAN, and you’ll want to put in place a business model that can account for the fact that the data institution won’t be able to generate revenue from the data it publishes.
  • Creation: if your goal is to facilitate safe access to sensitive data, you will need to ensure you have the relevant knowledge and skills in your team to develop technologies and governance processes that limit the exposure of sensitive information, such as privacy-enhancing technologies and rigorous auditing processes. And just as developers need to get planning permission from the local authority, you will need to ensure that you have the appropriate permission and legal authority to handle that sensitive data.

In my work I have found that thinking about data institutions through these four ‘causes’ or lenses can help make sense of what data institutions are made of, how they’re structured, the roles they can play and how they’re built. It can also help to establish a shared language with clients and stakeholders when designing data institutions to best suit their needs and when communicating the benefits/ capabilities to stakeholders.

Generally, the process we work through with clients starts with defining the function, then discussing the general forms that can support that function, the materials or building blocks that can be combined to make the desired form, and then working through how to go about building it. This process also usually involves discussion of the ongoing operation, evaluation, improvement and potential retirement of the data institution.

It is a variable and iterative process that is highly context dependent. The process involves frequently pausing to examine and reflect on the emerging whole and make changes where necessary. Without pausing to take this more holistic view, there is a risk that we focus too much on each individual aspect and fail to take account of how different components interact and/or counteract. Taking a more holistic view also makes it possible to reflect on how the data institution as a whole will interact with its surrounding ecosystem rather than focusing exclusively on building the thing itself. After all, you can build a perfectly adequate model home, but if you don’t also think about how to connect that home to local infrastructure like roads and sewage lines, then you’re likely to end up with another Sudden Valley on your hands.

But if there is one thing that Aristotle’s four causes have helped me learn and communicate, it’s the importance of nailing down the function or role that a data institution will play in helping to get data where it needs to be so it can be used in the desired way. Yes, as the data institution develops and the ecosystem surrounding it evolves, new use cases will emerge and therefore the roles that the data institution plays will evolve. The same is true of buildings that are eventually repurposed as the needs of their community evolve. But until you’ve nailed down the primary use cases in your ecosystem and can explain how the data institution will function in order to facilitate that use, you probably shouldn’t break metaphorical ground. After all, as Plato’s most famous pupil said, an explanation that can’t invoke all four causes is no explanation at all.

--

--