How Machine Learning and Artificial Intelligence Will Change Freedom of Information Administration

Open Government Partnership

Published in

OGP Horizons

6 min readMar 29, 2024

Artificial intelligence is changing how we do everything.

Written by Joseph Foti

This piece is part of OGP Horizons, a series from OGP that explores topics at the cutting edge of open government. OGP Horizons aims to capture open government innovations from around the world to tackle today’s biggest challenges — from navigating new technologies to responding to global issues like climate change. Today’s and tomorrow’s problems can not be solved by governments alone — they will require all of us to evolve, together.

Artificial intelligence is changing how we do everything. Nothing will be changed more quickly by these new information technologies than information management itself.

Governments can be slow to adopt new technologies relative to other sectors of societies. But a cautious approach may leave open government advocates on the defensive. So, let us take a closer look at how new technologies are going to shape the administration of freedom of information — and its less flashy companion, government record-keeping.

The Good

There will likely be a few areas where new technologies can be applied in ways that improve the availability of official information. These may include:

Document Capture: Major companies have already rolled out new AI applications to digitize paper-based resources, like optical character recognition, in ways that were unimaginable 20 years ago. In practical terms, this means that official government documents can be made into digital, searchable, and indexable documents more quickly. Ideally, this means that government agencies can expend more energy that can be shifted to physically scanning and holding such documents, rather than converting them to machine-readable formats — or not converting them at all. (While many wealthy countries have rules on digital-only communications, this is a rarity worldwide, so capture remains a challenge.) For information not captured in formats that are not machine-readable (such as images and PDF files), AI will be useful for additional extraction and creation of data as well as document metadata. While this will not obviate the need to move to open data formats and rules to access original formats, it can accelerate the usefulness of cataloging. In the case of historical records, this will be particularly useful as they may not be digital.
Document Cataloging: Once “born-digital” documents or documents that have been digitized are made available, public officers typically spend time categorizing and cataloging those documents. This may include entering in metadata (for example, which agency generated a document), flagging for exemptions (such as privacy or security concerns), and summarizing information for better searchability. In the future, it is increasingly likely that some amount of this information will be cataloged with applications built off of generative AI models such as Chat GPT 4.0, which already does a relatively commendable job of summarizing and shortening longer documents. Of course, machines will make mistakes — perhaps more, perhaps less than humans, but certainly of a different type. This change means that agencies and archivists will lead to shifting resources from hand-indexing or cataloging to (1) ensuring that there is good quality training data; (2) auditing the quality of automated processes; and (3) ensuring that such cataloging remains compatible with the law.
Retrieval for Information Requests: Existing information requests (especially very frequently requested information) can be used to train machine learning models. This can allow the algorithms to recommend information from the archives and responses to the public on key information. This does not obviate the need for human oversight and responsibility, but it can accelerate elements of the hunt for information. Jason R. Baron of the University of Maryland has written about the urgent need to take advantage of machine learning methods in searching through increasingly voluminous amounts of government records, especially email messages.
Document Declassification: In some situations, the declassification of documents can be held up for considerable amounts of time, especially in cases where multiple agencies or institutions have equities in a particular document. In these cases, machine learning may be able to help identify and make recommendations as to which documents should be released. As with other improvements, it cannot change the need for human responsibility and oversight. It can, however, help inform whether a declassification action is consistent with precedent and it may even help to proactively identify which archives should be proactively released, even without classification. Indeed, in the United States, the Public Interest Declassification Board has recommended an overhaul of the US classification system to take advantage of the inflection point of the new technologies.
Data Protection: There are a number of reasons that requests may be subject to withholding. These include issues of security, personal information, or deliberative processes. While citing exemptions will always need some amount of human review, researchers have devised new means to accelerate this process. One example is a test case on using a deliberative-language detection model to review requests under the US Freedom of Information Act.

The Bad

The introduction of machine learning and artificial intelligence also raises concerns, however, and governments will need to be proactive if they are to prevent the worst negative impacts of these technologies.

Authentication: The opportunities for counterfeit information — that is, information which appears official, but is not — will increase. This will likely lead to an increase in disinformation (including from officials themselves) and opportunities for fraud. The inverse problem is also true. As governments release information that may be incriminating or may compromise interests, powerful actors may simply dismiss such information as counterfeit or fake news. For that reason, additional efforts will need to be made to ensure that people, especially journalists, can verify the “chain of custody” of official information.
Hallucination Prevention: Large language models can also create information that does not exist in the “real world.” This problem may be particularly problematic in low-information contexts, where government press releases may provide the bulk of training data or there may be little data to begin with. For example, I recently used ChatGPT to identify whether political finance data in a lower-middle income country was publicly available. The app strongly — but erroneously — claimed that such data was available in open data format as the law required. This problem may be exacerbated in countries where there is not an abundance of independently verified data or where publicly accessible government information is scarce or incomplete. Establishing additional controls and oversight, ironically, may be of even more importance in such circumstances.
Compliance Oversight: As agencies begin to employ algorithms to aid decision-making, there is the danger that such decision-making itself becomes automatic, and that past bad habits and practices become baked in. For example, if a frequently abused exemption to information requests becomes part of training data on retrieval processes or declassification decisions, that data point risks encouraging a negative pattern of practice. In these cases, there is an additional need for inspectors, auditors, and right to information oversight bodies to have the competence and resources to examine these practices and make strong recommendations on how to deal with the lack of disclosure. In addition, they may wish to proactively require the disclosure of training data for algorithmic decision-making models and tools before such problems emerge.

The Unchanged

Finally, a number of governance problems remain, regardless of the improvements to access to information regimes through the use of artificial intelligence:

Official Communications: In many countries, officials continue to use unofficial channels of communication (such as electronic texting and messaging apps, including those with self-destructive features) to message one another and constituents and to store information. Non-compliance is rampant and erodes the soil from which freedom of information grows. Because these platforms will continue to emerge like mushrooms after a storm, access to information regulators will need to identify means of improving official compliance, and will need to ensure that rules are technology-neutral. (In this case, “technology neutral” means that the principles remain the same even as technology continues to evolve. For right to information, this means that, even as officials adopt Slack instead of email, for example, the same rules of access should apply.)
Appeals Processes and Contestation: Regardless of the pace of technology adoption, appellate bodies — whether ombudsman’s offices, courts, or information commissions — will need to continue to provide venues for individuals to challenge denials of information. What will change is whether these bodies have the competence to the causes of denial, especially where those patterns are due to an automated process and to investigate patterns of denial.

The artificial intelligence revolution is taking place already. Indeed, in the US, some energy is beginning to take shape around using AI to streamline information requests. The question is whether government systems will be able to adopt AI to enhance and protect the right to information.

How Machine Learning and Artificial Intelligence Will Change Freedom of Information Administration

Written by Open Government Partnership