The future of data portals.

Let’s start by not calling them data portals.

Published in

Vizzuality Blog

9 min readApr 5, 2018

Open data portals. Three words. Three fairly uninspiring words. Maybe I’m jaded but if I read one more press release that claims a data portal is going to change the world, I will crawl under my desk and gently rock as I mutter ‘no, no, no’ repeatedly under my breath. Data portals don’t change the world. People change the world. People acting upon and using the knowledge they extract from all of the resources at their disposal change the world.

Thankfully, people everywhere have woken up to this fact and are developing portals, platforms and applications that are much more than cavernous repositories of datasets. As we approach the launch of Resource Watch — a new approach to connecting humans with knowledge — I’ve been taking a look at the lessons the open data community learnt from the early data portals and how they are shaping the data sharing networks of the future.

When open data portals first became popular, there was an attitude that all you had to do to change the world was put the data out there. ‘Build it and they will come’ seemed to be the mantra for a while but all too often these data portals were stuffed with irrelevant, unusable data published in obscure or unreadable formats. Governments and organisations under time and financial pressure took the easy route and selected quantity over quality, foregoing context and tools that would help people understand and make use of the data. In the rush to meet short term targets, the long term sustainability of the portals was neglected and they became stale and irrelevant.

Today, we understand that data portals are about improving the timeliness and precision of action. They are designed to save time — the time it takes to find, process, and understand data and turn it into action. It’s all about reducing the time it takes to spark a flame that turns into a fire that changes the landscape. To be the match, a data portal/platform/app/whatever jazzy name you want to give it, must be laser-focused on the problem it is addressing, the people it will help, the partners who’ll bring it to life, and long term sustainability. Let’s take a closer look at each element and understand how they shape the success and change-making potential of a data portal.

Define the problem.

“Sometimes it seems like there’s a data problem but once you start talking to people about what they need, you’ll see there’s another underlying issue that has nothing to do with the data.”
Nathaniel Heller, Executive Vice President for Integrated Strategies at Results for Development.

Before investing time and money into the development of data portal, it’s a wise move to dig into the root cause of the problem being addressed. GovLab proposes four questions that will help reveal what the crux of the issue is:

Why does the problem exist in its current form?
What contributing factors could be at play?
What is the potential knock-on effects of solving this problem?
Why hasn’t the problem already been solved by someone else?

In a recent interview, Nathaniel Heller at Results for Development told me about a project where an approach to opening up data changed after reviewing the potential opportunities. The initial idea was to create a dashboard of agricultural data that would motivate national political leaders to embrace a push for agricultural transformation. But then R4D spotted an opportunity to help smallholder farmers in Kenya improve their access to fertiliser, so they decided to shift strategies.

Instead of a dashboard, R4D and their partners at the Local Development Research Institute created an SMS-based service that would help farmers in Kenya locate the right fertiliser for their crops, for the right prices, from a location within easy travelling distance. More than 10,000 farmers have already subscribed to the service but we have to wait until harvest time to see if the app has helped improve yields through improved access to fertiliser.

In this example, an analysis of the problems the data dashboard was meant to address revealed a more impactful approach to enabling a change with data. The analysis of the problem also identified a specific user group, which leads us neatly onto the second element of a successful data-sharing approach — know your users.

Know your users.

“Some of the most effective open data projects … are laser-focused on a specific user group.”
The Periodic Table of Open Data’s Impact Factors.

By swapping the one-size fits all approach for user-centric design, designers and developers have been able to create portals, platforms and applications that are tailored to their target audience. They take into consideration that “citizens have differential access to the hardware and software required to download and process open data sets, as well as varying levels of skills required to analyse, contextualise and interpret the data.”*

In Kenya, most farmers don’t have reliable or regular access to the internet. Therefore there would be little point in putting fertiliser data into an online application because no one would ever see it. However, almost everyone has access to a mobile phone so an SMS-based information delivery system would maximise the chance of farmer accessing and using the available data.

Today’s data sharing platforms are also keenly aware that users need context and analysis to help them make sense of the data and decide what to do with it. The revamped countries pages on Global Forest Watch give users an opportunity to see and understand if the deforestation events they are seeing are normal for that region, at that time of year. By providing context, Global Forest Watch is helping people develop an informed opinion on the state of forests around the world and decide if they need to take action.

Here, the grey lines add context by showing the trend of GLAD alerts in previous years.

Think sustainably.

“Releasing data requires good publishing platforms, and coordinated infrastructure to house the data and make data repositories and libraries interoperable.”
The Center for Open Data Enterprise.

Successful data portals and applications are ones that people keep returning to, and using to make data-driven decisions. As Abhi Nemani, founder of EthosLabs, has observed, we should be thinking about creating strings of related, yet distinct interactions, rather than solo engagements. However, maintaining interest in a source of data requires long term planning and investment — both in terms of finance and technology. To remain relevant, data portals need to be maintained and updated regularly. All the potential hurdles people will have to overcome to maintain the data and make use of it have to be considered.

Longevity is one the reasons we choose to develop platforms and applications using open source software and publish our code publicly. The tech community is constantly working on improvements to the software it uses, and publishing our code openly means anyone can use it, add to it, and improve it. We also maintain a focus on interoperability — so solutions can be applied to multiple uses.

Form partnerships.

“Data science is a team sport.”
DJ Patil, former US Chief Data Scientist.

The boundaries and silos between organisations, and sometimes within organisations are credited as a cause of data portal failures. Partnerships lead to better products as they tap into the variety of skills and resources the government, public sector, private sector and non-profit sectors can each offer. “For open data programs to succeed, they need to be supported by a community of stakeholders from business, academic, and civil society,” said DJ Patil, former US Chief Data Scientist.

In today’s world, where simply putting the data out there isn’t enough, it takes a team of data scientists, designers, developers, subject matter experts, storytellers, project managers and everything in between to make data accessible, usable and useful. “It’s not just about accessing data, but about the ability to link that to strategy, to share that with our communities. It’s about putting a narrative around charts and graphs,” says Julia Richman, chief innovation and analytics officer (and interim CIO) for Boulder, Colorado.

Forming partnerships is also a way to “create a sustainable, market-driven ecosystem that lowers the cost barrier to data publication”.** In their review of the challenges in sharing research data, the Open Data Enterprise highlighted the example of NOAA working with Amazon Web Services, Google, IBM, Microsoft and the Open Commons Consortium to publish their data. “These private sector organisations have the infrastructure and technical capacity to deal with the volume and complexity of NOAA’s data,” they state.

Hosting data in the cloud provides users with remote access to the data as well as the computing power to perform an analysis of the data. Indeed, the PREPdata platform makes use of NOAA data, using an API to call and draw upon NOAA’s extensive collection of datasets, including sea level trends.

The future of data portals.

It seems to me that we need to find a new way of describing the portals, platforms and applications we use to openly share data. As our definition of success boils down to impact and user delight, we are creating spaces where people can quickly expand their knowledge and understanding of an issue in more meaningful ways. They can look at a graph of deforestation and know for certain if the decline has hit an unacceptable level, and they can quickly share that information with their local journalists and government representatives.

These new data sharing approaches are placing power into the hands of ordinary citizens and elected officials alike by understanding their needs, their skills, and the barriers that currently stop them accessing and using data. Armed with this knowledge, the open data community can support and encourage those people who want to change the world — and help them do it faster and with greater precision.

The next big challenge.

Our next challenge will be to increase the speed at which we combine multiple datasets and run fast analysis across all of them to extract new discoveries. Right now, large global datasets are sitting on Amazon’s and Google’s servers devouring computing power that would make our laptops take-off into the sky like rockets. Even though plugins and applications allow a researcher’s laptop to tap into those datasets, a cross layer analysis is nearly impossible for them. Together with our partners, we’ll be researching and testing new ways to improve people’s access to the insights generated by the analysis of these big, combined datasets. It’s an exciting moment in the evolution of our ability to discover new knowledge.

Stay tuned for the release of Resource Watch on 11 April 2018.