Data is the key ingredient for evidence based policy making. A growing family of artificial intelligence techniques are transforming how we use data for development. But for these and more traditional techniques to be successful, they need a foundation in good data. We need high quality data that is well managed, and that is appropriately stored, accessed, shared and reused.
The World Bank’s new data catalog transforms the way we manage data. It provides access to over 3,000 datasets and 14,000 indicators and includes microdata, time series statistics, and geospatial data.
Open data is at the heart of our strategy
Since its launch in 2010, the World Bank’s Open Data Initiative has provided free, open access to the Bank’s development data. We’ve continuously updated our data dissemination and visualization tools, and we’ve supported countries to launch their own open data initiatives.
We’re strong advocates for open data, but we also recognize that some data, often by virtue of how it has been acquired or the subjects it covers, may have limitations on how it can be used. In the new data catalog, rather than having such data remain unpublished, we’re making many of these previously unpublished datasets available, and we document any restrictions on how they can be used. This new catalog is an extension of the open data catalog and relies heavily on the work previously done by the microdata library.
Five reasons to use the new data catalog
The catalog provides a single entry point to all Bank datasets tagged with consistent license, essential metadata and other features for you to find data easily. While we have introduced many features, here are my five favorites:
For the first time, you can search the Bank’s survey, time series and geospatial data across all regions and topics from one place. What’s even better is you can search inside datasets, down to the names of indicators and variables. This kind of “deep search” is great for discovering data you may not even know existed.
2. Geospatial catalog
For the first time (I know I said it again), we are releasing geospatial datasets covering various topics such as land cover, roads, and energy,. This wouldn’t have been possible without some serious heavy lifting done by our Geospatial Operations and Support Team (GOST).
The only thing more important than data is Metadata. Who made this dataset? How was it produced or acquired? When was it last updated? Who’s allowed to use it? What have people already done with it? Are algorithms shared that will allow for reproducing the construction of a dataset or indicator? While the catalog tags each dataset with some basic metadata consistent across all datasets, the amount and nature of the metadata will vary depending on the type of dataset.
4. Data licenses — essential metadata that is often ignored
5. Tracking data use — citations & visualizations
Data producers often don’t get enough credit for their data work — part of the reason is that it’s hard to track where data have been cited, used and re-used. Understanding where and how the data are used helps us understand the impact of the dataset and divert investments in priority data areas . It incentivizes data providers to release their data more proactively much like research papers, giving them more accountability. We are also tracking visualizations published by our teams using our data as a means to better understand the way our data is being used in various articles and blogs.
How are you using the new catalog? Are there any features you particularly like or would like to see? You can get in touch on Twitter: @worldbankdata, by email: email@example.com and you can take our data catalog survey.
Read more World Bank blogs.