Solving for Unknowns: AI for Government Accountability in Low-Data Environments
In many places, the data we need to know whether our officials are acting in our interest is missing. This is a problem when AI hallucinates something other than the truth. Fortunately, there are solutions.
By
In Liberia, many journalists make their money from reprinting government pronouncements.
In Indonesia, AI repeats official legal statements without verifying whether the law is actually implemented.
Ask ChatGPT about the state of diamond mining in Mongolia and you will get a very confident answer. (Problem: There are no diamond mines in Mongolia.)
What do these examples all have in common? They demonstrate the problem of using general large language models (LLMs) to generate accountability and monitor policy implementation in low-data environments where important data for decision-making is missing. The truth is, however, government officers are going to adopt the tools in their everyday work. Journalists and civil society investigators will too. Yet, without the correct inputs, we risk creating less accountability in public life, not more. How can we ensure that AI enhances accountability instead of eroding it?
In this piece we look at the characteristics of AI in low-data environments and some of the approaches we can take to improving it.
Why Does AI Hallucinate in Low-Data Environments?
AI, as traditionally used, requires incredible amounts of training data. It feeds on the massive body of the internet. But the internet is not an accurate map of humanity; it just represents where people post the most. This poses a problem in low-data environments. Generic commercial products like ChatGPT can have real problems.
- Missing and unreliable data: Governments do not have the capacity to produce large amounts of data or text for analysis. (Or do not do so in machine-readable formats.) This can be true not only in poor countries, but in many local jurisdictions in wealthy countries.
- Biased data: Governments may produce the vast majority of data, or data may largely describe the law, but not implementation. Data also may be primarily targeted toward transnational commercial interests, and thus favor English-language sources. This can be because of weak independent business or civil society sectors that create competing data in larger economies. It might also be because a government disallows the creation of independent data, as in Tanzania.
- “Source confusion” and “semantic leakage”: LLMs may pull data from data sources that are frequently associated, but are not the same. For smaller countries, like those in West Africa, data may not be specific to a country. (For example, reports on Togo may also regularly describe the situation in Benin or Ghana. This may result in hallucinations and false information.)
The Elephant in the Room: Bad Data
The fundamental issue in low-data environments is the lack of varied data and the inability to act on what data there is. This observation, on its face, seems like circular reasoning. But it is nonetheless true. Low-data environments may have:
- Poor quality or inaccessible data that may be partial, unreliable, or propagandistic,
- Limited technical capacity in civil society, media, or independent accountability organizations, and/or
- Missing or superficial processes for monitoring and accountability.
The ultimate solution to problems of weak data is to generate more, higher-quality data. Ideally, this data is produced not only by the credible parts of the state but also by independent actors. The solution is also to empower people to hold governments accountable for those decisions. This means having trained people in the loop to pose the right queries, assess the sources, and double-check the answers, which requires good processes for soliciting and processing input.
But those ultimate solutions are not immediately available in many or most places. Instead, we must rely on workarounds — second-best solutions that help us deal with low-data environments.
Problems and Workarounds
There are times when we have to accept that low-data environments are what they are. The challenge, then, is how to work around that while still taking advantage of what AI technologies have to offer. There are three key problems to consider.
Problem 1: Source Confusion and Misaligned Contexts
In many low-data environments, the problem isn’t just missing data — it is confused context and misaligned relevance. AI models often prioritize content from large, English-language sources or dominant narratives. For governments, businesses, or civil society groups using AI for accountability in a low-data environment like Burkina Faso, this creates a risk that AI tools will return irrelevant results or rehash regional or global cliches that do not match the local reality.
For example, a system trained on global news and development industry reports might repeat generic corruption indicators without understanding concrete governance issues on the ground. Fortunately, there are approaches that can help.
- Build a data corpus based on a limited set of local priorities: Instead of focusing on everything, an agency or organization can train their AI on something specific — gold mining permits, rural education, or decentralization budgets. They can build out a dataset with local laws, policy plans, budget lines, and local-language media. This may be done at the training stages or the retrieval stages. For the purposes of public accountability, it may make sense to train machine learning interfaces with specific data through open-source LLMs (like LLaMA, Mistral, or Falcon), where an individual can narrow down or fine-tune training data (e.g., legal codes, regulatory texts, classified archives). Alternatively, one could keep a custom document library, and the model only responds using that content through retrieval automated generation or RAG systems. Importantly, any such customization requires time, knowledge, and resources.
- Use custom metadata and embeddings for local relevance: Local actors can tag sources by geography, sector, language, and date to ensure the retrieval layer knows what’s recent, local, and useful.
- Invest in local data partnerships: The options above are not free. But costs can be spread out if governments partner with universities, community radio stations, and civil society groups that already collect hyper-local knowledge. In the case of accountability data, because data is produced or managed by independent authorities (like independent controllers’ offices or ethics bodies) they will need to be able to enter into partnerships.
- Digitize internal records: People building accountability tools can encourage ministries to digitize and structure internal records — even Excel or PDF reports can be useful with good metadata. They may scan, translate, and index publicly available court documents, parliamentary proceedings, or procurement gazettes where available, taking advantage of rapidly evolving optical character recognition. (For example, tools like AymurAI can help improve privacy protection.)
Ideally, these approaches can help build models that can respond to queries like “Has funding reached rural health centers in Centre-Nord?” by retrieving the right documents from a curated, context-sensitive corpus — not noisy global reports about universal health coverage.
Problem 2: Government-Dominated or Skewed Information Ecosystems
When the only available data sources are official or propaganda, a model will reinforce the government’s own narrative even if it is built only on local data. This is not good for holding officials to account.
- Triangulate information: People building accountability tools can combine what little local information exists with cross-border or diaspora reporting, international datasets (e.g., leaks from the International Consortium of Investigative Journalists, international finance institutions’ procurement portals), and global financial disclosures. In particular, builders may wish to reach outside of government. Non-governmental organizations, law firms, investigative journalists, and university researchers often have structured or semi-structured databases (e.g., lists of beneficial owners, contract award anomalies) that aren’t online but can be uploaded to a private corpus. Sharing such data is less a technical exercise, however, than it is a matter of ensuring trust between model developers and data holders.
- Create a model on data from similar contexts: In some places, data may be unavailable, but a model could be trained using similar datasets from countries with comparable institutional setups. For example, training a dataset on public procurement red flags might work better in a relatively high-data setting like Kenya, which has some data. This may, in many ways, help train probabilistic models to flag potentially problematic data in Uganda or Tanzania, which are likely to have similar patterns of troubling anomalies.
Problem 3: Legal and Ethical Risk of False Positives
A false positive is when you think you see something that actually isn’t there. Accountability AI cannot operate with “acceptable error rates” when reputations, freedoms, and political consequences are at stake. Take the example of Brian Hood, a whistleblower in a corruption scandal involving a company owned by the Reserve Bank of Australia. After exposing wrongdoing, Mr. Hood ran for mayor in Hepburn Shire, Victoria. Shortly thereafter, ChatGPT generated an article that said that Mr Hood was one of the perpetrators of the corruption. The tool not only did not help ensure accountability — it directly undermined accountability. Being 80 percent right is simply not good enough when trying to incentivize good, ethical behavior.
Victoria, Australia is hardly a low-data environment. Imagine then, how the probability of false positives is multiplied in areas with significantly less training data. Further, what happens when users confuse the superficial fluency and confidence of AI for truth? When transferred over to official accountability practices, whether carried out by journalists, civil society, or government, false positives become a serious legal and ethical concern. Some solutions are below.
- Human-in-the-Loop Design: Don’t design the system to generate conclusions. Have it retrieve and cluster red flags and present them to human analysts and journalists for verification.
Confidence Scoring: LLMs (and many other forms of AI) do not give a confidence score. Requiring a model to suggest a concrete level of confidence (quantified where possible) can help to flag where further human investigation will be helpful. - Explainability: For cases of accountability, systems can be made more effective where the AI system is required to show outputs next to source text or data. This improves traceability and the link to data.
- Rule-Based Guardrails: Accountability actors can set up strict rules around which entities the model can generate outputs about (e.g., no individuals to be named unless publicly listed in parliamentary or regulatory disclosures). This is already being tried on a mass scale in the case of megaleaks, where massive amounts of data are leaked and journalists must sort through terabytes of data to weigh where there is a legitimate public interest versus a right to privacy, commercial secrets, national security, or other concerns. Accountability actors within governments can learn from journalists to distinguish between and weigh these values.
Conclusion
AI tools in low-data environments risk amplifying gaps and distortions unless they are built with care. That means prioritizing local data, strengthening human oversight, and building in a transparent way. And it means bringing in local expertise as well. Having independent experts and civil society is absolutely essential — big companies alone won’t do. Responsible AI won’t replace accountability actors, especially in low-data contexts, but done right, it can help them see further, move faster, and be more accurate.