The importance of having agricultural data platforms

David Anabalón
Datawheel Blog
6 min readJun 9, 2022

--

A historical context

Agriculture is nothing new; since its revolution 12,000 years ago, it has allowed us to control our food supply during droughts or floods, develop labor division, and build the urban settlements most of us live in. It would be no exaggeration to say that it has figuratively and literally shaped the world we know today.

With the development of writing, data collection began as a way to control and keep track of the economic activity in ancient civilizations. The earliest records baked into what we could say are the first spreadsheets of our history, made of clay tablets, accounted for wheat, property, and cattle details.

As civilizations like Egypt started to record the events in nature — like flows of water from the Nile — the development of technology in agriculture began to intensify. Thanks to the now available surplus of crops, products like wheat started to mean more than food.

Records show that the ancient Egyptians used wheat to pay taxes — even though they had the Deben” as their monetary unit — because it was a more universally consumed product, and it was easier to agree on its value. This made authorities keep an even closer eye on these products to organize and plan the development of cities, establishing the importance of agriculture beyond its nutritional value.

Ancient Egypt Gallery, Louvre Museum, Paris, France. Complete indexed photo collection at WorldHistoryPics.com. (Gary Todd)

Nowadays, there has been a growing discussion surrounding agriculture, especially with the recent issues around the global supply chains. Fears of facing a world food shortage are becoming more extensive, and smart ways of utilizing our resources are urgent to feed our global community of more than 7 billion people and allow us to keep flourishing in these challenging times.

What do we produce? Where do we produce it? Whom do we sell it to? How many resources are we spending to produce it? These are the many fundamental questions we have to ask ourselves, our governments, and our institutions to understand before thinking about possible solutions.

Agricultural datasets are usually messy, incomplete, and hard to understand. Although there are initiatives to integrate the data and help countries with fewer resources to understand their croplands better, we still have a long way to go.

Challenges and initiatives

Exploring agricultural datasets will probably face you with five main challenges:

  • Availability
  • Large size and volume of datasets
  • Understandability
  • Untidy format
  • Accessibility

There are also several initiatives to tackle these:

Collaboration is fundamental to improve data availability in countries with fewer resources. Partnerships like the one between The Global Land Analysis & Discovery Lab (GLAD) & NASA to see how much croplands have expanded from 2003 to 2019 through satellite images helped us discover, for example, that Africa is the continent that had the most significant expansion of croplands, with 530,000 new square kilometers mainly dedicated to widely consumed products like soybean, rice, wheat, and maize.

Global Croplands Expand Map (2003 to 2019)

The goal of this project was to integrate the knowledge on cropland performance, availability, and climate effects. Africa is an example of agricultural data availability that hasn’t grown as exponentially as its croplands. The continent has been historically affected by major issues related to colonialism & political instability, which can help explain the current situation around the lack of resources, social inequality, and poverty, among other difficulties.

Large data availability brings new issues to light; as datasets become more extensive, they also become harder to manage, and new technologies are necessary to store the information. This usually brings up the cost and creates new elements that make it difficult for countries with fewer resources and that don’t have the proper technological structure to support the storage of large amounts of data.

Although programming languages can make processing the information more efficient and workable when treating large amounts of data, they also present a new challenge: since now the information is coded, most people won’t have the knowledge to understand what these datasets are saying, and the key elements that could be actionable into better policies or better soil usage will remain hidden and unused.

Data Africa is an example of a Datawheel project addressing these challenges. The platform developed in 2017 provides information on critical topics for the development of 13 countries south of the Sahara, combining agricultural, climate, health, and poverty data.

Data Africa’s Homepage

Being able to make critical information understandable and visually attractive on the development of these croplands is a powerful tool to not only to further data democratization and build trust for the public sector but also to create an opportunity for strategic coordination between nations to take full advantage of the production, performance, and potential of crops, all of these while keeping in mind the environmental impact on the soil.

“One of the challenges we had to face working in the platform was the geospatial aspect of the data,” says Jonathan Speiser, one of the lead developers of Data Africa. Matching and validating each country’s different levels of data aggregation and availability is tricky, on top of the already complex dynamics of each region, where borders, governments, and politics can constantly change.

This also connects with challenges around untidy format. The capacity to combine all of the available sources of information is fundamental, but this becomes time-consuming and almost impossible when there isn’t an utterly standardized format that allows data integration. Many of the datasets have differences in recording unrelated elements with different time frames or definitions, making it harder to understand the information as a whole.

Similar issues have been faced by The CIREN Institutional Observatory, a public platform designed and engineered by Datawheel in South America that allows navigating and analyzing data from the National Information Center for Natural Resources in Chile for four main categories: Cadastre, Soil, Climate, and Geography. According to the GLAD, South America was the second-largest expansion of croplands, with 370,000 new square kilometers between 2003 and 2019.

CIREN Institutional Observatory’s Homepage

Although the CIREN platform was developed for a single country, some technical challenges were still present in aggregating the different geopolitical, climate, and soil data layers. “We couldn’t access one of the datasets that would allow us to combine all of the information available for an even more integrated visualization”, says the lead engineer for this project, Nicolás Pérez.

Accessibility is key to creating accurate snapshots of the state of development of soils worldwide; data collection is useful as long as you can later access it. Some existing datasets are often difficult to access because of privacy and security issues. Unlocking their potential in ways that could lead to better public policies for more sustainable and respectful growth of countries could be key to safer food supply chains.

Final Thoughts

Strategic coordination clearly will be essential in the following years to improve many of the challenges in the current state of agricultural data. Although it is necessary to better data availability, it’s also fundamental to remember that data collection is not the only step required to have a clear view of the state of croplands around the world. Having structured, coordinated, and easy to work datasets is fundamental to keep developing platforms like these that allow coordinating the international community's efforts, changing the decision-making process on the investment and distribution of croplands, and securing our global community with the resources we need to thrive.

--

--