What is Dateno?

Ivan Begtin

Published in

Dateno

3 min readSep 3, 2024

A few months ago we launched Dateno, a new search engine with many unique features that we are proud of.

Firstly, Dateno is a focused search engine, similar to many academic search engines or Google Dataset Search.

At it’s core is the Common Data Index core, now renamed Dateno Registry. It’s more than 10,000 data catalogues all over the world, with almost every data catalogue linked to the country, certain topics and so on. For about a year we have been collecting these data catalogues using various discovery methods.

At some point, this registry became so large that it’s possible to create the metadata crawlers that collect details about all possible datasets.

What kind of data collected inside Dateno?

Dateno indexes the following types of data catalogues:

Open data portals: portals such as Data.gov, Data.gov.uk, Data.gouv.fr and many others. Most use open source software such as CKAN, DKAN, JKAN or proprietary services such as Socrata, OpenDataSoft and Junar.
Geospatial data catalogues and portals: Geospatial data catalogues are very common and have many similarities to open data portals, but focus on geospatial datasets. These data catalogues use Geonetwork, GeoNode, ArcGIS Hub and some other types of software.
Indicator catalogues: a lot of publicly available data available as statistical indicators. It could be PxWeb software, popular in Europe, or .Stat suite, or some global and national data catalogues with SDMX data inside.
Scientific data repositories: data published by researchers. Notable examples are Zenodo and other Invenio-based portals and Dataverse, the Harvard project for sharing scientific data.
Machine learning data catalogues: these public data catalogues are relatively new, Kaggle, Hugging Face, OpenML and many others are listed here.
Microdata catalogues: not really open data, not really open and most of them use NADA open source software.

And many other types of lesser-known data catalogues.

How does Dateno collect metadata?

The first serious difference between Dateno and other data search engines is that it uses multiple parsers to index data from almost all available data catalogues.

Instead of crawling the whole internet for Schema.org dataset types, and instead of indexing OAI-PMH endpoints as it’s implemented in OpenAire and BASE, Dateno supports about 32 types of API endpoints, including DCAT, CKAN API, Geonetwork APIs, Eprints, Dspace, WMS, WFS, WCS, WMTS, OGC Collections, Dataverse, Invenio API and so on.

We will support almost all software types and major data catalogue APIs, so the Dateno index will grow quickly and data catalogue owners won’t have to do anything special to get their records added to the search engine.

Realtime search

You type, you get results immediately. Just try it, it’s very quick and results appear immediately.

Facets everywhere

The metadata we collect allows us to provide multiple facets to make search results as granular as possible. Simply use facets to focus your search on what you want. For example, many datasets have sub-regional territorial metadata and you could find the data you want at that sub-regional level.

What next?

Many more features to come:

API: very soon we will announce an API, a way to work directly with the Dateno index.
Many more datasets: our goal is to reach 20 million datasets by the end of this year 2024. It’s very possible and will happen soon.
Index improvements: de-duplication, better data enrichment, more facets and other changes that improve the search experience.