Lasers, hedgehogs and the rise of the Age of Yoghurt: reflections on #OpenDefra

Last week the UK’s Department for Environment, Food & Rural Affairs (Defra) celebrated 12 months of #OpenDefra: their commitment to releasing 8,000 datasets as open data in one year, and kickstarting a transition across the department to an open culture.

I experienced first hand the changes inside Defra initiated by #OpenDefra, as expert adviser on data to the Secretary of State, Elizabeth Truss. I was brought into Defra as part of the Secretary of State’s extended ministerial office (EMO, an acronym I have great affection for), on secondment from the Open Data Institute.

Now that my time in the EMO is over, I wanted to share some of my experiences — around data transformation, changing organisational culture, and being inside government — in case there’s something in here that resonates for others.

Also because the stories coming out of #OpenDefra are pretty wonderful. I was already an ‘open’ person and a ‘data’ person going into Defra, so thought I knew vaguely what to expect. The reality was much more complex and nuanced, and more inspiring and satisfying as a result.

This is a long post but I hope you’ll stick with it. For me, the #OpenDefra story is about more than publishing lots of open data in 12 months. It’s about openness — open collaboration, open networks, open data — as a way to profoundly transform the way an organisation works.

I can’t cover everything in this post. I’m sure I’ll think of stories and lessons I want to share as soon as this is published, and hopefully I’ll add them at some point. This post covers:

  1. An overview of #OpenDefra and Defra
  2. Publishing 8,000 open datasets (and changes behind that)
  3. Transitioning to an open culture

Context: Defra is big and sprawling. There are lots of silos.

Anyone looking to do data transformation at a government level can learn from Defra’s experiences. Defra is a kind of microcosm of the UK government: it comprises 34 public bodies of different sizes and histories, scattered across England. Some agencies, like Environment Agency and Natural England, are much bigger than their parent department, the Department for Environment, Food & Rural Affairs. Some bits of the Defra group have historically been independent and used to doing their own thing. Some organisations generate their own revenue; others are entirely publicly funded.

Even though there are shared priorities and policy overlap, lots of work takes place in silos.

This blog and image from Institute for Government gives you a sense of Defra’s size and set up.

Defra’s size and sprawl means there are lots of legacy IT systems and databases that don’t talk to each other; data collection, services and new tech are being duplicated across organisations (often at great cost); and useful data and insights don’t spread across the organisations that make up Defra, let alone outside the network.

And this duplication can get really expensive. Take buying data from other organisations, for example. Each Defra organisation negotiates individual licence fees for the same dataset, with terms and conditions that restrict ways they can use the data, usually inside their own organisation. As a group, Defra ends up paying six or nine or twelve times for the same data, with incompatible restrictions on use. As ‘one Defra’, Defra has stronger negotiating power and can negotiate terms that reduce this internal data friction.

Making Defra stronger as a negotiator, as well as more connected, porous and cost-effective, are aims behind the Secretary of State’s reform agenda. Opening up Defra — both internally and externally — will in the long term reduce inefficiencies within the group itself, while accelerating new ideas and networked thinking in the sectors Defra cares about.

These kinds of ideas were also behind the creation of the extended ministerial office: bringing external expertise, new ideas and ways of working into Defra and join up what teams and agencies were doing.

Defra is data-rich! It is a data nerd’s paradise.

Defra organisations are data collectors, data maintainers, data sellers and purchasers, and data users. I can’t think of another department that rivals Defra for the sheer diversity of data it collects and uses, from satellite imagery to aerial photography to 3D height data, from live animal tracking to butterfly and hedgehog migration numbers. The list goes on: marine ecosystems, plant health, flooding, land boundaries, food and drink consumption, noise pollution, water quality, air pollution and more.

A big catalyst for #OpenDefra has been the realisation that there’s data Defra holds that has potential uses, and very engaged users, outside the Department. The Environment Agency’s 3D height data (LIDAR) is used by the EA for flood modelling. When it was released as open data during #openDefra it turned up in experiments modelling snowfall across a landscape, in urban planning, archeology, agriculture and wine making, resources for schools and even the game Minecraft. Just this week EA published its LIDAR point cloud data (extremely high resolution) — look how cool it can be!

The diversity of data also means there are different kinds of data expertise — geospatial expertise, statistics, social science, operational research, economics and more — across Defra.

This is great because it means Defra is already good or should be good at doing some data things. But it is also hard because different professions have different ways of talking about data and different conceptions of what ‘data’ is; they have their own professional networks and tools; different needs as data users, and preferences for how it should be presented; and they provide different services.

One big challenge was reconciling and navigating the diversity of data languages across Defra. Seemingly simple questions like “what data do you have?” have different meanings for different professions. The first time I asked a statistician for the ‘data’ behind their statistics on UK beehive locations, they looked at me in confusion and said, ‘you’re looking at it’.

This is how I discovered the term ‘micro data’.

On another occasion, a scientist dismissed my description of ‘data’ including things like a definitive list of every Defra organisation, or blueprints for Defra buildings, as not ‘data’ but ‘management information’.

Nobody is wrong.

They use different kinds of data for different purposes. But when you’re trying to join up how data is managed across a large organisation, or across a government, as well as encourage new kinds of uses, you need to be aware of your different data users.

I remember much howling and gnashing of teeth after receiving an email from another government department, attaching a table in a Word doc and instructing Defra to ‘list all the datasets it maintains’. No definition of data, no limitations on scope, and no instructions for describing the dataset in any standard way (no metadata standard, essentially).

I can’t imagine that ultimately being a very useful/usable dataset of datasets for that department.

Releasing 8,000 datasets as open data — what’s in a number?

Talking just about the number of datasets Defra’s released over the last 12 months doesn’t really tell you much about how the department has changed. (Defra in the end exceeded its 8,000 dataset target, announcing last week it had published 11,007 datasets as open data in the first 12 months).

The number of datasets itself isn’t important — you can chop one dataset into 100 smaller datasets. There’s understandable wariness in the data community about open data publishers seeming to focus too much on data quantity, at the expense of quality and usefulness.

I was really torn going into Defra as to whether the large 8,000 dataset target was helpful or harmful for the department if it genuinely wanted to realise impact from open data and open collaboration. After seeing what happened inside Defra as a result of the target, I’ve come to think it was pretty essential.

If Defra was actually going to publish 8,000 datasets as open data in 12 months in a meaningful way, it had to quickly learn lots of things about itself — how data is collected across its 34 historically siloed organisations, what data each organisation holds, which teams and people have responsibility for data, who has access to it, and the kinds of problems preventing teams from doing data differently. It had to provide guidance for publishing, institute new governance structures and revise others, and come up with ways of mitigating ethics and personal data and third party IP issues.

The very public, very large 8,000 datasets target forced Defra organisations to effectively draw back the curtains on their data holdings and start talking to each other.

There was one meeting about hitting the 8,000 target, with data professionals from Rural Payments Agency, Animal and Plant Health Agency, Defra, Joint Nature Conservation Committee and more where we were trying to come up with solutions for publishing lots of datasets containing identifiers that might be able to be linked back to people (no easy answers here). It dawned on me during the meeting that this was the first time all of these organisations had come together to talk about this specific issue — and they had all developed their own ways of working through it.

Some had already trialled solutions, some had sought legal advice, some had done focus workshops with stakeholders in their sectors who could be affected by it, others has just gone ahead and published the data. And we could use their experiences to develop a solution for the rest of Defra.

The target necessitated closer collaboration between teams; openness to sharing experiences made the solution better; and everyone benefited from being part of the same discussion.

At the end of 12 months, Defra has 11,000 datasets available as open data. But maybe more importantly for the department, it has a strong sense of its broader data holdings and skills level, a network of data professionals and senior leaders who will have responsibility for data, and ways of grappling with some of its deeper data challenges.

For external users of Defra data, there’s still lots to be done improving the quality and signposting of its open data, and making more useful data as open as possible. I hope to see the Defra Data Programme analysing the 11,000 datasets released as open data to get an idea of where gaps are in Defra data holdings (as open data, but also as internal/shared data resources), what there’s wider interest in, what open Defra data can support Defra’s other strategic priorities, and which open data should be maintained (and how) going forward.

Encouraging an #open Defra

I am obviously an open convert. I mean, I work at the Open Data Institute. It would be easy to dismiss what I’m saying as an open ‘evangelist’, except that some of it is just common sense.

Being ‘open’ to new ideas and open to sharing your own experiences and work helps to make things better, faster. Someone, somewhere, has almost definitely experienced the problems you’re facing. They have tested solutions to that problem. There are almost certainly tools and tricks and techniques you could use that you just haven’t heard about yet.

Being open isn’t just about broadcasting or pushing out work you’re doing in case others benefit. It’s about being open to experiences, expertise, and tools that could help you improve your own work.

An open culture can also mean people venturing far out of their comfort zones.

The Food Statistics team in Defra were encouraged to publish the underlying diary data behind the long running Family Food Survey as open data — decades of information about what British families eat and drink. This was entirely new for them, and not risk-free, but they rose to every challenge.

They consulted a group of external data users about their needs, they had a crash course in personal data risks and anonymization techniques, they sought feedback from anonymization experts and other organisations publishing similar data. They blogged and shared drafts of their thinking and proposed data specification — and in the end they even created a commentable version of their privacy impact assessment, in case there were any risks or ways of mitigating risks they’d missed.

The Family Food Survey release attracted news headlines across the UK, and around the world.

Now I see senior Defra civil servants having conversations in the open about questions they’re facing in their work. It makes Defra visible as a place where these kinds of conversations are taking place, and answers being nutted out.

I loved being part of #OpenDefra. There’s energy in Defra around data, and the desire to do things differently. It’s growing, even if it will take time to reach every corner of the department. The Defra Data Programme’s data cave was my happy place, with a warm, inclusive and creative team assembled by former Head of Data Alex Coley (a long time data champion for UK government, who has recently moved to Epimorphics).

Sometimes #OpenDefra could be messy and chaotic — there was trial and error, fits and starts, and lots of figuring out the roadmap as we went. But there were also civil servants being bold, forging new collaborations and networks, and getting out of their comfort zone.

This is just the beginning for #OpenDefra. I can’t wait to see where it goes from here.


Update 7 July 2016: Stefan Czerniawski was disappointed to get all the way through this article and not hear anything about yoghurt. Did you know that UK yoghurt consumption has increased by more than 450% since the 1970s?!

Image from ODI/Kiln visualisation British Diet Labs

Thank you Family Food Survey.