The Holy & Unfortunate Entanglement of Excel and the SDGs

In the back of a Toyota Hilux, an exhausted field enumerator with a mental catalog of rural water point functionality that rivals some of the best databases I’ve seen in the field pulls out a beat-up and dirty laptop. It’s paint job has sun faded spots. An asset sticker from the ministry that paid for his equipment is only half legible, with the last two identification characters well worn away. He has just returned from the Northern region of the country; my team from the East. It’s taken the better part of ten days and 16 snowballing interviews for us to connect in his truck on the outskirts of the city. What started out as a (naively) simple task of checking the data quality for rural water coverage in a district where we wanted to work has turned into an adventure in fact finding that is reminiscent of investigative journalists uncovering a tax evasion scheme.

The Meetup: Exploring Excel data in the back of a pick-up in Rwanda

It takes ten minutes for the laptop to stir and whir alive. On the desktop are hundreds of, if not a thousand, files with a green logo that is the norm for data storage in the humanitarian development field. Almost identical file labels with no clear organization or structure in naming convention — it is difficult to know where to begin for the untrained eye. But our new friend navigates the storm quickly, opening up the latest set of numbers from his recent surveys. Through a weary look in his eyes, he turns the screen around to show us his findings. With broken English (and translation aid from our local fixer), he explains the sample size estimates and his survey methodology for calculating the number of households who have access to basic water services. His highlighted cells are disappointing, as the numbers are wildly lower than the national statistics reported to the UN but still higher than the estimate we were given by the district government official we interviewed the day before. If were to throw out the max and min values among the set of reported coverage numbers, the data points on screen seem to be the most valid we’ve seen to date.

We ask if we can copy some of his findings onto a thumb drive to take back to our offices, but we’re met with suspicious eyes. The contents of his laptop are a prized possession, like a dragon and his horde under the mountain. He promises to send us a digital copy with summary figures, if his supervisor will allow it. No one in the truck is holding their breath.

If you have spent any time working in international development or humanitarian assistance, you likely have your own story of data hunting through the swamp of poorly organized analog and digital record keeping. Like weedy vines that plague a field of crops, the typically unstructured chaos of data management haunts every element of development work, from health to agriculture to infrastructure. Systems and technology vary greatly from the national level down to sub-national and local, often littered with gaping holes of data loss, validation errors, and masking of crucial details by repeated aggregation of already aggregated numbers. Scattered along the data pipeline from field to reporting agencies lie many an intervention in updated information management systems that were ultimately unsuccessful in bringing order to the stormy sea of field data.

And no matter what part of the world or which technical area you explore, no matter what part of the data chain you are interested in — Microsoft Excel is king.

Through my travels of routine monitoring and program evaluation work across 30+ countries in the last decade, Excel is the one common thread that holds everything together. There is not a more widely used tool for collecting, storing, analyzing, and reporting data. I can say (with almost 100% certainty) that Excel plays a role within all governments (at various levels). It’s a critical tool for implementation partners as they work tirelessly to measure the progress towards the Sustainable Development Goals. It is a holy marriage of convenience; a pairing of comfort with necessity. It is known everywhere and has the lowest barrier to entry of any data platform in the market. If data programs and systems were a staple food, Excel is like rice — it is robust, sustaining, and gets the job done.

The problem is that our data deserves better than Excel. New and advanced features like Power Query and live data connections continue to make it a more powerful tool, but it is the equivalent of driving across the United States on the back of a Vespa — it can get you to your destination, but there are much better alternatives. Digital data collection through mobile surveying, IOT devices, or other multi-modal ICT technology are faster and less prone to error than entering data into Excel cells. Modern data platforms and databases like Mongo DB or Azure SQL are much better alternatives for storing data in the cloud rather than sharing XLSX files. And when it comes to analysis and visualization…. let’s just leave it that no professional analyst or data scientist in the business world would consider Excel to be their best option.

Excel is cheap and/or free on many PC devices, but price is no longer the great equalizer for justifying Excel as your all-in-one data tool. The market for free and open source alternatives to graduate from doing everything in Excel to a modern data workflow is growing daily. Analysis in R or Pyton, unstructured data stores in Neo4J, #BigData and Machine Learning with Tensor Flow— the list is nearly endless (though here are some good places to start).

In some ways, at the highest level, the data+IntDev community has already moved well beyond Excel. We have IATI APIs and global data sets hosted by the World Bank. Now we just need to permeate these best practices down to that pesky regional office in rural Kenya and the engineer doing data collection in the informal settlements outside Kathmandu. It will take some training and learning new tools. It will take planning to figure out the right architecture. It won’t be easy. But neither are the SDGs — so why limit ourselves to safe and easy when we can be powerful and ambitious?