Flickr, @Sweetie187, CC By 2.0

Dataspaces have arrived

It took less than 10 years to deliver on Alon Halevy’s vision.

Michael Hausenblas
Large-scale Data Processing
2 min readMay 27, 2013

--

Michael Franklin, Alon Halevy and David Maier introduced the idea of dataspaces based on the observation that data management solutions can be understood along two dimensions:

A space of data management solutions, ACM SIGMOD Record December 2005.

In their seminal 2005 ACM SIGMOD paper they hence suggested the term dataspace as an extension of the traditional database term. One example dataspace is shown in the paper:

An example dataspace and the components of a dataspace system,
ACM SIGMOD Record December 2005.

While some of the components (looking at you XML and WSDL) are arguably outdated and may cause eye cancer, you get the idea, right?

Guess what? In 2013, less than 10 years after their paper, the dataspaces have arrived in the form of the Hadoop ecosystem. We are now in a position to design and deploy dataspaces, addressing a variety of datasources and data formats with a range of ‘schema-awareness’—from strongly typed RDBMS over JSON to plain text. We can now query, and manipulate and manage the data sources and integrate them in a true pay-as-you-go approach.

#polyglotpersistence #lambdaarchitecture

--

--

Michael Hausenblas
Large-scale Data Processing

open-source observability @ AWS | opinions -: own | 塞翁失马