Data catalogs. Part 2. Data and metadata standards
3 min readJun 6, 2022
There are a lot of data standards related to data science, open data, scientific data, digital assets as data, and so on. It’s nearly impossible to talk about all of them at once, and I’ve collected the most important list.
Data catalogs and metadata publishing standards
- GovEx&Geothink data standards catalog https://datastandards.directory/ — a large catalog of standards for utilitarian use, includes an assessment of their openness, suitability for use in other jurisdictions
- data standards on Data.gov https://resources.data.gov/standards/ — a list of standards, communities and tools for working with data standards
- metadata standards catalog from DCC (Digital Curation Center) in the UK https://www.dcc.ac.uk/guidance/standards/metadata/list — focuses on metadata standards of archived data
- Egeria — Open Metadata and Governance https://github.com/odpi/egeria — an attempt to create a single metadata standard in corporate data directories
- OpenLineage https://github.com/OpenLineage/OpenLineage — another approach to a unified metadata structure in corporate directories
- OAI-PMH (Open Archives InitiativeProtocol for Metadata Harvesting) https://www.openarchives.org/pmh/ standard used in digital archives, used for metadata of any digital objects. Actively used by scientific data catalogs
Data Cataloging Standards
- Dublin Core http://dublincore.org/ — the basic standard for describing most of the metadata of any objects
- Data Catalog Vocabulary (DCAT) — Version 2 https://www.w3.org/TR/vocab-dcat-2/ — a standard for maintaining and publishing data catalogs, used by a significant part of the platforms
- DCAT-US https://resources.data.gov/resources/dcat-us/ — extension of the DCAT standard with data structures relevant for data catalogs in the USA
- DataCite Metadata Schema https://schema.datacite.org/ — research data publishing standards used by DataCite, which issues permanent DOIs to data publications
Geodata publishing standards
- CSW https://www.ogc.org/standards/cat — geodata metadata publishing standards from the Open Geospatial Consortium
- INSPIRE https://inspire.ec.europa.eu/quick-overview-implementers/57528 metadata and data description standards from the INSPIRE open geodata infrastructure program in the EU
- GeoJSON https://geojson.org/ — geodata encoding standard, RFC 7946
- ShapeFile https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf is a popular geodata publishing standard developed and maintained by Esri
- ISO 19115:2003 https://www.iso.org/standard/26020.html — ISO standard for describing geodata metadata
API Publishing Standards
- The OpenAPI Specification https://www.openapis.org/ — consortium and API description standard, development of the Swagger standard
- RAML https://raml.org/ — specification and API modeling tools
- API Blueprint https://apiblueprint.org/ — another API modeling specification
- HAL — Hypertext Application Language http://stateless.co/hal_specification.html — API description specification as hypermedia
- Hydra Core Vocabulary https://www.hydra-cg.com/spec/latest/core/ — still at the draft level, also describing the API as hypermedia
Statistical publication standards
- SDMX https://sdmx.org — international standard for publishing official statistics
- Data Documentation Initiative https://ddialliance.org/ — international standard for publishing surveys and statistics
Object Description Standards
- Schema.org https://schema.org/ — a set of standards for describing objects on web pages for indexing by search engines
- ontology registry http://vocab.linkeddata.es/ — numerous ontological descriptions of subject areas
- semantic data types registry https://registry.apicrafter.io — registry of semantic data types used to detect PII data and other types of identifiers and dictionary based type
Universal Standards
- Data Package (Frictionless Data) https://frictionlessdata.io/ is an actively developed and implemented standard for describing and publishing data in the form of standardized data packages. Includes a large number of data preparation tools
- Network Common Data Form (NetCDF) https://www.unidata.ucar.edu/software/netcdf/ — scientific data publishing standard used since the late 80s
- BagIt https://tools.ietf.org/html/rfc8493 — RFC 8493 standard for packaging digital objects. Actively used by government and academic archives, such as DataOne and the US Library of Congress
Industry Data Standards*
- Fiscal Data Package https://specs.frictionlessdata.io/fiscal-data-package/ — standard for publishing financial data on expenses and income
- The General Transit Feed Specification https://developers.google.com/transit/gtfs — transport infrastructure description format originally developed by Google
- Open Contracting https://standard.open-contracting.org/ — standard for publishing contract data
- IATI Standard https://iatistandard.org — standard for publishing information on international assistance to developing countries
*not all industry standards are listed here, as there are quite a few of them, but only the most noticeable
Data Standards Groups in Government
- Data Standards Authority https://www.gov.uk/government/groups/data-standards-authority — a working group in the UK Government on data standardization
- Web of Data https://www.w3.org/2013/data/ — a working group in the W3C to develop standards for working with data