Open data sets sources

Data is the main element in every project of the Startup Weekend in Artificial Intelligence. This article presents a top down view of some of the most important datasets sources available in the web.

Illustration of how organized data sets should look like

Google launched some months ago a dataset portal to find datasets hosted in three kinds of websites: an editor, a digital library or personal web pages. The link is the following:

Google dataset search: https://toolbox.google.com/datasetsearch

This search engine provides organized access to free datasets that were previously difficult to find or whose existence was unknown. Some of the topics are environmental and social sciences, government data and data provided by news organizations. This tools allows to easily explore information from NASA or browse Harvard University archives.

For all those who want to share their data through this search engine, Google recommends adopting the schema.org open standard to describe the information. As more open access repositories use the schema.org standard, the variety and amount of information available will grow.

Global datasets

From a global point of view the datasets indexed by google have information about economic indicators (inflation rate, interest rate, unemployment rate, fiscal balance), political (government spending, election results, national laws) or geographical (water quality, air quality) subjects.

Some of main open data websites that are indexed by google dataset search are the following:

Apart from google search data tool, one of the biggest open datasets indexer is CEIC data analysis platform, which contains more than 4 million data series, covering more than 195 countries, 20 industries and 18 macroeconomic sectors. Some of the topics are national accounts, production, government and public finance. It is possible to get a details of the available datasets for each country in the following link:

CEIC: https://www.ceicdata.com/en/countries

Another global hub of datasets is the ArcGIS website, which contains data about security, education, health, culture, housing conditions, demographics and many more. The following link let’s your search by content, location or category queries:

ArcGIS: http://hub.arcgis.com/pages/open-data

European datasets

One of the main European datasets websites is the European Data Portal, which harvests the metadata of Public Sector Information available on public data portals across European countries. The main categories of the data available are the following:

  • Agriculture, Fisheries, Forestry & Foods
  • Energy
  • Regions & Cities
  • Transport
  • Economy & Finance
  • International Issues
  • Government & Public Sector
  • Justice, Legal System & Public Safety
  • Environment
  • Education, Culture & Sport
  • Health
  • Population & Society
  • Science & Technology

European data portal: https://www.europeandataportal.eu/data/en/organization/european-union-open-data-portal

Cities / Countries datasets

Not all #GSWAI cities have a dedicated portal for open data, but here I present some of the ones that have one:

Paris — France

Bucaramanga — Colombia

Montreal — Canada

Brussels — Belgium

Hong kong — China

Sydney — Australia

Conclusion

This kind of tools provide useful information that can be used for use cases such as: targeting people that use a determined transportation system, finding governmental expenses for market study or showing the economical evolution of a city.

Of course, sky is the limit and these data sets can be used for much more cases. If you find more datasets or tools that you find interesting please tell me on the comments below and I will update the information.