Beyond the Hype-Cycle for AI and Open Data

Official data is still the backbone of much of AI. And we don’t have enough of it.

Open Government Partnership
OGP Horizons
5 min readJul 30, 2024

--

The peaks, valleys, and plateaus of excitement for open data. (Credit: Ganapathy Kumar via Unsplash)

By Open Data Charter (Natalia Carfi), Joseph Foti, and SilvanavF (Silvana Fumega)

Most of the people reading this blog are old enough to remember recent “hype cycles” — Web 3.0, blockchain, crypto currencies, the Metaverse. Most of these have not (yet) lived up to their promise.

Open government data was once the most glamorous of tech ideas, too. It showed potential for innovation, how it might correct abuses of power, and its economic value. However, some skeptics like freedom of information advocates, environmental justice advocates, and privacy advocates argued that it ignored harder accountability measures or specific applications.

More than a decade later, we can assess what has and has not worked.

Government data — especially open data — does have more of a record to show than other technologies:

  • Widespread adoption of the Open Contracting Data Standard has resulted in improved data around the world and much better value for taxpayers’ money.
  • Huge improvements over time in national statistical office capacity let us assess progress and challenges in development;
  • Good governance data has made a difference in detecting financial crimes. This includes data to end anonymous shell companies (known as “beneficial ownership data.”)¹

We would argue that we are at exactly the moment when this data is becoming most useful. Long after the hype, a more sober business case emerges, especially around open data. And that is precisely why leaving open data out of the current global discussion on AI is a problem.

Peaks, valleys, and plateaus

The “Gartner Hype Cycle” is a way to think about excitement for different technologies. (See Figure 1 below.) Artificial intelligence is at the “peak of inflated expectations” with hundreds of applications under development. Some may work. Others may not work at all, or have a business case or a public benefit.

In contrast, open data is nearing the “plateau of productivity.” This is to say, we know what it is good at and how to produce it. We also know the use cases. We have moved beyond the “release everything now” moment and entered the “publish with a purpose” phase. We also are past the “trough of disillusionment” or saying that it all was a waste of time.

Figure 1. The Gartner Hype Cycle as it Applies to AI and Open Data

Source: Wikipedia, with elaboration by authors

Modern maturity

We are not AI pessimists. But for AI to work for people, we need better and more abundant data. We also need better regulations, with human rights at their core, to balance access to data with ethical use and privacy.

Many countries lack the necessary data or rely on estimates, painting an inaccurate picture of development. This means that we may not know which policies work, where the environment is being damaged, or where poverty is most acute.

This relates to a second problem: biased data results in biased AI. Even with model adjustments and weighted data, small and unrepresentative samples only amplify inaccuracy. This is most obvious when discussing the gender data gap. Yet, it is easy to imagine other blind spots around caste, class, or small and medium-sized enterprises, for example.

Closing these gaps will take years of hard work. But what if there were a hack that could speed up the quality and the process? We argue that open data is the missing ingredient to improve both.

The case for open data

More data is not enough. That data should be open, where appropriate. Open data is not a “nice to have.” It is a must-have. It works better for many cases, especially in governance applications. Here is why:

  • Accessibility: Open data is more accessible. Everyone can find it and use it.
  • Verification: Open data helps triangulate other findings. For example, public beneficial ownership data is a shared resource. Imagine law enforcement looking for financial crimes and a banker noticing strange transactions. They each may use AI red-flagging software to catch suspicious activity. But having a public register they can both refer to means the same public set of facts.
  • Explainability and traceability: Users can find and explain open data. AI applications built off of this data will improve as the data itself improves. This is especially important as the world “runs out of data” and many applications may begin to run on synthesized data. Having new, ever-better data matters to keep AI applications improving.
  • Accountability: Government officials are responsible for the quality of the data. Requests to fix data will get a response more often when it is part of someone’s job to fix things.

Not all data can be open but key open government data sets remain worth investing in. Investing in data means investing in people and organizations. At the peak of the open data hype, capacity was not a marquee topic. Yet it is now accepted as essential. That is why we cannot overlook it again.

Moving forward

A crucial moment is the UN Summit for the Future on September 20–23 this year. It aims to guide UN member states toward a more sustainable and humane high-tech future. The principal negotiations focus on the Pact for the Future and its Global Digital Compact. This is an important “annex” to the Pact for the Future that will shape international discourse around AI, connectivity, data, and Digital Rights for the next decade.

Yet, the current outcome documents do not reflect the importance of high-quality, representative, non-biased data. Open data is barely addressed too. Good data is essential for human rights, for ending poverty, and for our clean energy future.

Without it, we risk repeating the past. As those who have experienced it can attest: that would be a mistake.

¹ How do we know about improvement? Over the last ten years, people have measured open government data around the world across sectors. These include:

  • The Open Data Index (OIN) evaluates the openness and coverage of official government statistical data.
  • The OECD OURdata Index focuses on open data in wealthier OECD countries.
  • The Open Government Data Index is a 2019 pilot part of the UN E-Government Development Index.
  • Begun in 2020, the Global Data Barometer (GDB) gives detailed, country-level insights on key sectors. It builds on the Open Data Barometer. GDB covers not only open data but privacy, human rights, AI, digital security, data governance, and inclusivity.

--

--

Open Government Partnership
OGP Horizons

75 national & 104 local governments, plus thousands of civil society groups, working to deliver the promise of democracy beyond the ballot box through #OpenGov.