Government as a Platform, the hard problems: part 4 — Data infrastructure and registers

8 min readAug 5, 2019

Government as a Platform is the approach of reorganizing the work of government around a network of shared APIs and components, open-standards and canonical data registers. The hope is that this will allow public servants, businesses, and others to deliver radically better services to the public, and to do so safely, efficiently, democratically, and in a more accountable way. This article is part of a series looking at some of the difficult design, policy and technology questions posed by the Government as a Platform concept.

The meme of ‘data sharing’ is too successful

The narrative about “better data sharing” in government is strong (in 2019, a search on the GOV.UK website for the term “data sharing” returns over 43,000 policy documents, news articles and research papers).¹ But it is often shorthand for the unaccountable duplication and joining of data: closer to photocopying or copy-and-paste. That’s not to say the intended outcomes are bad — things like saving money or meeting a short-term policy aim — just that it creates risks while failing to contribute in any meaningful way to the delivery of services beyond those immediate outcomes.²

In a world of “data sharing” data is duplicated. In a platform ecosystem data is accessed as needed, via APIs using agreed open-standards. Rather than being held in multiple places and multiple formats, data is stored in canonical “registers”. Registers are authoritative lists that are trusted and used by multiple services.³

Replacing “data sharing” in the minds of policymakers with a platform approach, where data is managed for the greater public good, will take leadership and the unlearning of past approaches to build a new culture around data. In Estonia, for example, the interoperability between different government agencies using the X-Road platform was not mandated by law, it was the product of a set of guiding principles, collectively held by civil servants, and laws preventing the creation of duplicate databases.⁴

For this to happen, it may be necessary for organizations that offer public-facing services and act as custodians of critical datasets to be split apart, allowing them to be able to focus properly on being a data custodian for all users across government (and beyond). This change in mindset and organizational structures will likley require political capital to achieve.

Data infrastructure and user centered design

It has long been a problem that design has not had a place in technocratic projects. This has begun to change as digital services units in governments around the world have adopted user centred design, user research and (most importantly) hired designers into government as civil servants.

However, the place of data infrastructure in user centred design practice is not well-formed. In the government context, this can be seen in the digital service standards that many digital service units have published. In a review of 13 such standards, only the U.S. Digital Service made a firm commitment to the development of APIs.⁵

While the long-term value of data infrastructure is often undisputed, the short-term cost often is. Digital service units will need to identify ways of making thinking about data and APIs part of their practice and fund it appropriately.

Standardization processes that work

Government as a Platform implies that governments move from ad hoc uses of data (ones that are typically tied to the provision of particular services) to the adoption of registers that meet the needs of multiple services. This will require different parts of government to all trust the same source data, and know they are speaking the same “language” when they make use of it.

The register of addresses maintained by Etalab, the digital service unit of the French government, provides a good example of what this looks like in practice. It has an open API that can be used by any part of the French government, or by the private sector. It uses a unique identifying number for each address, so that any service making use of it will identify addresses in the same way. And data is published in a standardized way based on the geoJSON format, so anyone consuming the data will know how to make sense of it.⁶ ⁷

Legacy government systems often display the opposite characteristics of the French address register — they are tied to a particular use-case, use different identifiers, and are stored in different formats by different parts of government. As such, Government as a Platform represents, in part, a standardization project. The risk is that the process of agreeing and adopting standards for countries with lots of legacy technology spread across multiple government agencies could become a nation-state sized exercise of “yak shaving”.⁸

However, it need not be onerous. From OpenStreetMap to the Fast Healthcare Interoperability Resource specification, and Internet Engineering Task Force, large-scale standardization efforts outside of government show what is possible. Each effort favours progress, working implementations, public evolution and concrete use cases over perfection and completeness.⁹ ¹⁰ ¹¹ In the UK, the Cabinet Office solicits suggestions for the adoption of existing open standards via GitHub – Businesses, citizens or civil servants can suggest areas that may benefit from standard definitions.

This approach will require governments to trust civil servants to work across silos to agree and adopt standards in a way that is emergent rather than prescriptive.

Putting limits around the use (and misuse) of data

The data infrastructure necessary to enable better government services — data stored once, but used by multiple services, using common identifiers and formats — also opens up risks around the misuse of data and unintended consequences from reuse in different contexts.

Data leaks have become a regular occurrence in both the public and private sector. If more data about people is centralized in registers, any leaks risk having a larger impact (especially if it can be more easily linked to other datasets because of common identifiers). Examples like the Singapore “HIV registry” breach in 2016 and the investigation into the sale of personal data from users of the Aadhaar identity system over WhatsApp, show that access controls alone are not enough to protect sensitive data.¹² ¹³

Approaches to anonymizing data that strip personally identifiable attributes, have increasingly been shown to fall short when cross-referenced with other datasets.¹⁴ For example, data released by the New York Taxi & Limousine Commission on the utilization of taxis was combined with license plate data and magazine photos to determine the movement of celebrities.¹⁵

Guaranteeing how data will be used once it’s been accessed remains an unsolved problem. This is especially important in situations where organizations have reason to mistrust each other (legitimately or otherwise), or the risks of unintended consequences are high. The example of the UK National Health Service sharing data with immigration officials, which in turn raised fears of disease outbreaks serves as a clear example of the latter.¹⁶ ¹⁷

Finally, the integrity and accuracy of data also becomes more important in a world of canonical registers. That’s because, if data about someone is wrong in a register, it could impact many aspects of their life — many services relying on the same data means many services that can go wrong if the data is incorrect.

All of these issues get even more complex as governments engage in more automated decision making (including using machine learning, which risk making things even more opaque), and design services that respond in real-time.

So, what should the response to these risks be? In part it should be institutional, with the organizations tasked with maintaining registers for the wider public good given clear mandates and encouraged to develop a culture where they act as custodians of data and watchdogs against misuse and unsafe practices.

Many countries will already have organizations that can provide a template — be they census bureaux, land registries or statistics agencies. There may also be the need for novel types of institution (India’s GSTN has already been mentioned in Part 3). We can also add to that the recent trial in the UK by the Open Data Institute with the concept of “data trusts” — legal structures that provide independent stewardship of data.¹⁸

Some problems are about technology

While the majority of the “hard problems” that relate to government platforms are not about technology, questions of data, in part, are. As such, how data is used, maintained, accessed and joined, is one that requires an institutional, legal, cultural and technical response. Technology responses to some of these issues include:

Verifiable use of data through recording access in immutable databases such as Trillian or Amazon QLDB.¹⁹ ²⁰ ²¹ ²² Estonia’s X-Road platform uses this approach to create a tamper-proof log of data accessed across the system. (Contrary to reports, X-Road does not use blockchain).²³
Differential privacy is a mathematical technique that provides well-defined levels of privacy protection, even when data is cross-referenced.²⁴ It has started being deployed by Apple and Google to analyse data about their users.²⁵ ²⁶
Verily (Alphabet’s life sciences division) has implemented an approach called “polymorphic encryption and pseudonymisation” as part of a large cohort Parkinson’s study. It attempts to solve a general problem: how can government data scientists make use of sensitive data about people to improve outcomes, while minimizing risks of leaks, repurposing and the over-linking of data?
Verifiable claims is an approach to creating digital credentials that are cryptographically secure, privacy-respecting, and machine-verifiable.²⁷

Given many of the approaches are still emerging, there needs to be more space for technologists working in government to operationalize ways of limiting how data can be used in ways that can be trusted and verified by the public and their representatives. Given there are few examples to point to from the public or private sector, this is one area where government needs to lead the way.

Read part 5 — Identity and trust

HM Government, “Search”, GOV.UK,https://www.gov.uk/search/all?keywords=data+sharing&content_purpose_supergroup%5B%5D=news_and_communications&content_purpose_supergroup%5B%5D=research_and_statistics&content_purpose_supergroup%5B%5D=policy_and_engagement&order=relevance. Retrieved 5th June 2019. ↩︎
See this for a longer argument about why this the idea of “data sharing” needs resetting: https://medium.com/digitalhks/data-sharing-in-government-why-democracies-must-change-direction-badfaa2463ec↩︎
Paul Downey, “The characteristics of a register”, 13th October 2015, https://gds.blog.gov.uk/2015/10/13/the-characteristics-of-a-register/ ↩︎
Rainer Kattel and Mergel Ines, “Estonia’s digital transformation: Mission mystique and the hiding hand”, UCL Institute for Innovation and Public Purpose working Paper Series, (IIPP WP 2018–09) (2018), p10 ↩︎
Richard Pope, “Digital service standards and platforms”, digitalHKS, 26th November 2018, https://medium.com/digitalhks/digital-service-standards-and-platforms-c11e060cacd ↩︎
Etalab, “API”, https://adresse.data.gouv.fr/api ↩︎
Etalab, “Foire aux questions”, https://adresse.data.gouv.fr/faq ↩︎
See this article for a definition of “Yak shaving”: “Yak Shaving”, Techpedia, https://www.techopedia.com/definition/15511/yak-shaving. ↩︎
OpenStreetMap, “Map Features”, https://wiki.openstreetmap.org/wiki/Map_Features ↩︎
HL7.org, “Http — FHIR v4.0.0”, http://hl7.org/fhir/http.html ↩︎
“IETF | Internet Engineering Task Force”, https://www.ietf.org ↩︎
“Singapore HIV registry data leaked online in health breach”, BBC News, 28th January 2019, https://www.bbc.co.uk/news/world-asia-47027867 ↩︎
“Aadhaar: ‘Leak’ in world’s biggest database worries Indians”, BBC NEws, 5th January 2018, https://www.bbc.co.uk/news/world-asia-india-42575443 ↩︎
Arvind Narayanan and Vitaly Shmatikov, “Robust de-anonymization of large sparse datasets: a decade later”, 21st Map 2019, http://randomwalker.info/publications/de-anonymization-retrospective.pdf ↩︎
Atockar, “Riding with the Stars: Passenger Privacy in the NYC Taxicab Dataset”, research.neustar.biz, 15th September 2014, https://research.neustar.biz/2014/09/15/riding-with-the-stars-passenger-privacy-in-the-nyc-taxicab-dataset/ ↩︎
Denis Campbell, “NHS will no longer have to share immigrants’ data with Home Office”, Guardian, 9th May 2018, https://www.theguardian.com/society/2018/may/09/government-to-stop-forcing-nhs-to-share-patients-data-with-home-office ↩︎
Alan Travis, “NHS chiefs urged to stop giving patient data to immigration officials”, Guardian, 31st January 2018, https://www.theguardian.com/society/2018/jan/31/nhs-chiefs-stop-patient-data-immigration-officials ↩︎
Open Data Institute, “Data trusts: lessons from three pilots (report)”, 15th April 2019, https://theodi.org/article/odi-data-trusts-report/ ↩︎
Open Data Institute / Register Dynamics, “Putting the trust in data trusts”, 14th April 2019, https://www.register-dynamics.co.uk/data-trusts/index.html ↩︎
Emily Mattiussi, “Monitoring cloud data with Trillian”, IF Journal, 3rd April 2019, https://www.projectsbyif.com/blog/monitoring-cloud-data-with-trillian/ ↩︎
google/trillian”, GitHub, https://github.com/google/trillian. Retrieved 25th June 2019 ↩︎
“Amazon Quantum Ledger Database (QLDB)”, https://aws.amazon.com/qldb/. Retrieved 25th June 2019. ↩︎
Petteri Kivimäki, “There is no blockchain technology in the X-Road”, Nordic Institute for Interoperability Solutions blog, 26th April 2019, https://www.niis.org/blog/2018/4/26/there-is-no-blockchain-technology-in-the-x-road ↩︎
Cynthia Dwork, “Differential Privacy”, 33rd International Colloquium on Automata, Languages and Programming, part II (ICALP 2006), July 2006 https://www.microsoft.com/en-us/research/publication/differential-privacy/ ↩︎
Differential Privacy Overview, apple.com, https://www.apple.com/privacy/docs/DifferentialPrivacyOverview.pdf ↩︎
Eric Miraglia, “Privacy that works for everyone”, 7th May 2019, The Keyword, https://www.blog.google/technology/safety-security/privacy-everyone-io/ ↩︎
W3C, “Verifiable Credentials Data Model 1.0”, https://www.w3.org/TR/vc-data-model/ ↩︎