The importance of a data sharing enabling environment.

Mike Rose
8 min readJan 28, 2022

--

Why a lack of data sharing might be a symptom of something else…

Many of the people I work with and the projects I work on will discuss ‘issues’ around data sharing that need to be resolved. Often these ‘issues’ are actually a result of more systemic obstacles that have been ignored in the past whilst a particular piece of work or activity was being developed.

I remember back in about 2001 when the Environment Agency was developing new flood maps that the project, which had been running for a couple of years, was nearing conclusion after several millions of pounds being spent on data crunching using new state of the art modelling algorithms and software somebody realised that in order to share the data as envisaged, they *might* need to check the data licence agreements for the data that had been ingested.

Lo and behold there was a small clause in one of the licences for a dataset (which was licensed for under £10k) that had been ingested which prevented any publication of anything that had used the licensed data without permission. Sorting that kept me (and others) busy on and off for a decade.

Picture of Environment Agency flood map for London
Environment Agency Flood Map licensed under OGL v3

What this example illustrates is that around any ‘data sharing’ component there exists an “enabling environment” within which the various components need to be aligned with each other to ensure its success. This may be limited to ensuring that the permissions for data you ingest are appropriate, however depending on what your activity is, there may be more. [You can see more about my Environment Agency story here]

Through the work I have been doing over the past few years (building on the experience of the past few decades) I have been working to develop a way of explaining this ‘enabling environment’ in simple terms.

Note — I am being deliberately vague by using the term ‘data sharing component’ as in my mind this could be the collection of a new dataset, the creation of a new modelled database, the development of a data sharing IT system, the drafting of a data sharing agreement and probably 100 other things I haven’t thought of at this point.

It is worth considering at this point that identifying and solving data sharing issues are probably not the driver for any activity in this space. There must be a reason that the data needs to be shared, for example to allow people to understand their own risk of flooding, so there will always be an underlying problem that data sharing is part of the solution to. This distinction becomes important as we start digging into the enabling environment.

What is the data sharing enabling environment?

A data sharing component will exist within a data sharing ecosystem along with other activities. I.e., a data sharing ecosystem is made up of a combination of data sharing components.

Sketch of data ecosystem

Each of the components of any data sharing ecosystem have their own requirements from that system that need to be met to ensure that they function properly.

For example, if the key component we are thinking about developing is a new technological solution to share datasets, then within the ecosystem will be data creators and providers who are creating these new datasets. They will need to have in place rules and procedures for how that data should be shared with the new solution and a permissions framework to ensure that the datasets once gathered can legally be shared.

Therefore, if we want to develop and operationalise any one component within a data sharing ecosystem, for example by collecting new data, producing new data modelling tools or a new database system to share data, we need to consider the way in which all the related components function to support this rather than focusing on narrowly on just this one component.

To illustrate how not doing this could cause issues, consider that the drafting of data sharing agreement (the ‘data sharing component) will influence the wider “data sharing ecosystem” by setting the rules data users need to follow, such as when sharing data through an IT database. However, if not considered the in-country regulations might contradict the data sharing agreement. This conflict would prevent the data sharing ecosystem operating as intended.

So, you can see that if we focus on just the component being developed rather than considering the overall ecosystem factors that could affect the desired outcome are likely to be missed and consequently making it more unlikely that the anticipated change would be achieved; like not checking that the licence for a dataset needed for the creation of my new flood data actually allows me to share that new flood data!

Each of the components will have their own requirements / drivers. Combined, all the requirements and their drivers constitute the “enabling environment” within which that component, and in fact all of the components, exist.

To help consider the impact of the ‘enabling environment’ on any intervention within a data sharing ecosystem I think there are 5 key areas that make up that environment and therefore would need to always be considered. These are:

  • Infrastructure
  • Policy Environment
  • Expertise
  • Funding
  • Coordination

Infrastructure

Sketch to illustrate infrastructure

The data sharing ecosystem will be made up of many different components such as the dataset itself and the IT system that holds it, along with the documented rules and procedures that control the use of those things. These physical and nonphysical infrastructure components make up a data sharing ecosystem.

Other Infrastructure examples:

  • Documented procedures, rules and processes that guide and instruct how things should be done. For example, standard operating procedures to ensure surveillance data is collected consistently
  • Data sharing agreements that outline how data can be used when shared. For example, a data licence giving permission to use flood data in a particular way.

Policy Environment

Sketch to illustrate Policy environment

For a data sharing activity to function within its ecosystem there needs to be specific mandates or authority given to the actors within that ecosystem these will usually be documented.

For example, at different scales these could be:

  • Legislation, policies, laws, instructions such as a national policy outlining requirement to gather data on problem
  • Organisational policies, agreements such as a donor policy to invest in data that can be shared for multiple purposes
  • Project guidelines, project agreements that apply, for example, legislative and policy requirements in a structured way
  • From these examples you can see that the policy environment relates to both the data sharing activity and the underlying problem that is driving the data sharing activity. For example, the legal requirement to inform people of their risk of flooding along with the policy requirement to share the data that provides this information as open data.

These dual requirements could be within the same legislation or completely different policies.

Expertise

Sketch to illustrate expertise

Expertise is needed to enable all components of a robust data sharing ecosystem. The expertise required will be related to the problem being tackled and the specific appropriate data expertise.

It is critical not to fall into the trap of considering one more important than another. A good working relationship with clear accountabilities and engagements between experts will help develop a robust understanding helping to develop an effecting operational ecosystem.

For example:

  • Deep understanding of the topic / area at hand providing broad input, for example flood and flood modelling experts working together develop models
  • Understanding of how data is expected to be shared within the ecosystem and the specific requirements to allow data management experts to develop the appropriate tools and data licensing experts to ensure the required permissions are negotiated. For example, ensuring that data licensing experts know that the intention is to share flood model outputs as open data.

Funding

Sketch to illustrate funding

The different elements of a functional data sharing ecosystem need to have investment and ongoing funding to ensure its sustainability. The sources of funding may be different for the different components of the data sharing ecosystem.

For example, these could be:

  • Donor investment for research
  • Regional funding for remote sensing
  • Government budget allocated to train people to be able to undertake forecasting
  • Funding available to take the necessary action

Co-ordination

Sketch to illustrate coordination

A key issue for an effective and robust data sharing ecosystem is likely to arise from the way the different actors do or do not coordinate their work. Alignment of interests is critical to ensure that these actors can work together to deliver an agreed approach to data sharing. There is also how the interaction of other factors such as gender norms, culture, mindsets, ethics, and attitudes that are not documented [so not within the policy environment] but do need to be coordinated.

For example:

  • Ensuring wide stakeholder engagement allowing cross overs and common interests to be identified
  • Identifying critical stakeholders and then ensuring that they included in the development of any approach
  • How individual mindsets and approaches influence how data is shared, for example dogmatic rather than pragmatic interpretation of data sharing agreements.
  • Ensuring there is ethical consideration of whether data should be shared, for example when it is legal to share data about an individual but not ethical to use it in the proposed manner.

The enabling environment…

The 5 enabling environment components of any data sharing ecosystem described do not exist within that ecosystem, rather they overlap to create an environment that the ecosystem exists within.

The diagram below illustrates policy environment overlapping funding and expertise, where these cross over co-ordination is required and only where they all overlap, and coordination occurs will the data ecosystem thrive. The data sharing components make up the data sharing ecosystem at the centre of the Venn diagram.

Sketch venn diagram to illustrate the enabling environment

--

--