The Data Revolution Has a Software Problem

4 min readOct 14, 2015

Measuring the UN Sustainable Development Goals Requires Modern, Flexible Web and Mobile Applications

In Dodoma, Tanzania’s administrative capital, a group of visibly frustrated economists and statisticians discussed their work inside a sweltering room. They were from the nearby Singida Region, and they were on the front lines of what UN Secretary General Ban Ki-moon has called the Data Revolution, an effort to find innovative data collection methods that can measure progress towards the Sustainable Development Goals.

Their day-to-day work is helping to answer an important question. Given primary school enrollment rates, immunization rates, agricultural productivity, and other important quantitative indicators, how well are the people of Singida actually faring?

Getting the answer right is difficult, but incredibly important. Accurate statistics determine whether a country is on the right track. For example, official statistics suggest that Tanzania is on the verge of achieving universal primary education, but household surveys show that 15% of children between 7 and 13 are not enrolled in school. Organizations also allocate funding based on statistical indicators, which can be problematic. The Global Alliance on Vaccines and Immunizations gives cash incentives to countries based upon the number of children properly immunized, but studies have found that immunization rates are largely over-reported.

These gaps, of course, are partly a political problem. Regardless of the political environment, economic incentives should align with producing high quality data. However, I believe the political problem is impossible to solve without first solving a software problem.

The Problem With the Current Software

I was in the room in Tanzania at the invitation of the World Bank. Since I have experience in the region, and since the Department of Better Technology builds data collection and management tools for governments, I was asked to help evaluate the challenges this group was grappling with.

The group explained their biggest problem: the process of data collection, aggregation and analysis is opaque, a black box. Errors occur throughout the process, but it is hard to determine exactly where they are introduced. A recent report from the Center for Global Development echoes their sentiment.

In the Singida Region, surveyors armed with paper forms visit schools, health clinics and other public infrastructure in every village and town. These forms are collected by a staffer at the Local Government Authority (LGA), who enters them into an Excel spreadsheet. The data is aggregated upward: the LGA passes it on to the district, who then passes it to the region. Finally, it reaches the national level.

During each of these steps, there are opportunities for officials to approve, reject, or manipulate the data, but these actions are never recorded. This makes it difficult for the group to identify inconsistencies, or understand how a data set might be skewed.

Legacy software is largely at fault for this lack of transparency. For example, The Local Government Monitoring Database (LGMD), a desktop application that is supposed to help this group do their jobs, was rendered largely unusable because it couldn’t handle changes to questions, administrative boundaries or approval processes. Any updates had to be manually performed by a software developer, and then updated on each machine individually.

Tanzania is a leading member of the Open Government Partnership, and has recently launched an open data portal to display high-level statistics about the country’s performance. Yet without software that provides transparency to the process of data collection, policy makers won’t have the tools they need to allocate resources, and politicians won’t have the tools to tell a meaningful story about their country’s performance.

Software Isn’t Eating the Data Revolution

In 2011, Silicon Valley mogul Marc Andreessen wrote that software is eating the world. Industries as varied as health, agriculture and banking are re-imagined on top of modern, flexible web and mobile applications. The best of these products don’t simply replace inefficient legacy systems: they identify unmet needs, and solve them in new ways.

In East Africa, for example, the problem of unemployment is exacerbated by an information problem. Employers have job openings but have difficulty finding qualified candidates. Duma, a company that recently won an award at the pitching contest PIVOT East, is solving this problem with an SMS-based platform for screening candidates and then connecting them to companies. (Disclosure: I’m an investor in Duma.)

The infrastructure supporting these products often relies heavily on cloud computing. The companies behind them develop software in an agile manner, instead of working from a functional spec in an RFP. Governments are understandably risk averse, and can hesitate to adopt these principles. However, this hesitation has a cost. In the case of the Data Revolution, the cost is inaccurate statistics and misallocation of resources for the most important public services.

Getting to Better Data

There are two steps that governments and development partners interested in better systems for measuring outcomes should take.

First, they should change the procurement process, the rules which determine the types of companies that can help governments implement systems. From Silicon Savannah to Silicon Valley, I repeatedly hear from entrepreneurs that they are keen to help government solve problems in new ways, but they believe the chips are stacked against them in favor of legacy vendors and industry giants, like the group that built LGMD.

Secondly, government and development partners should encourage more people from the technology industry to join government. In the US, because of programs like US Digital Service and 18F, knowledge about modern software practices is slowly permeating through the public sector.

For the statisticians and economists that I met in Dodoma, it was clear that software alone would not solve all their problems. Even with a deep understanding of incentive structures, and with a careful engagement with political power, developing a more transparent and efficient data pipeline is a tremendous task. Yet it’s clear that the goals of the Data Revolution can not be achieved without a sincere effort, built upon modern software, to reimagine how governments measure the indicators that matter most to citizens.

The Data Revolution Has a Software Problem

The Problem With the Current Software

Software Isn’t Eating the Data Revolution

Getting to Better Data

Written by Joshua Goldstein