Linked Data Event Streams explained in 8 minutes

Samuel Van Ackere
8 min readFeb 1, 2023

--

What is a Linked data event stream, and how can it help you make your dynamic data accessible?

Linked data event stream, Linked data event stream, Linked data event stream, Linked data event stream, Linked data event stream, Linked data event stream, Linked data event stream

1. Linked (open) data

In 2009, the pioneer of the worldwide web, Tim Berners-Lee, presented a TED talk about linked data as the next phase of the internet, sometimes referred to as the Semantic Web. It is a vision of the evolution of the web of documents into the semantic web, where meaning is added through linked data so that, in addition to the current -human- users of the web, machines and computers can understand and interpret the meaning of data.

The Semantic Web allows more advanced and sophisticated search capabilities and machines to understand and process data like humans do.

Linked open data is a community project overseen by the W3C organisation and aims to enrich the Web by making open datasets even more accessible through the linked data method.

“Linked Open Data is Linked Data which is released under an open license, which does not impede its reuse for free.” Tim Berners-Lee

Tim Berners-Lee has proposed a 5-star system for rating the quality of open data on the Web, with Linked Open Data receiving the highest ranking:

⭐: data is openly available in some format (e.g. pdf);
⭐⭐: data is available in a structured format(e.g. .xls);
⭐⭐⭐: data is available in a non-proprietary structured format (.csv);
⭐⭐⭐⭐: data follows W3C standards, like using RDF and URIs;
⭐⭐⭐⭐⭐: all of the other, plus links to other Linked Open Data sources to provide a context.

Linked data is structured data linked to other data by relationships or connections to make it more valuable through semantic searches. It expands on established Web technologies like HTTP and URIs. Still, it uses them to communicate information in a way that computers can read automatically rather than only serving web pages for human readers. The idea behind linked data is to turn the Internet into a global decentralized machine-readable database.

Linked data event stream, Linked data event stream, Linked data event stream, Linked data event stream, Linked data event stream, Linked data event stream, Linked data event stream
Visualization of an RDF (RDF 1.1 Primer (w3.org))

The figure above visualizes Resource Description Framework (RDF). RDF is a standard data model for representing and sharing information on the Web, based on the idea of using triples to represent data. A triple consists of three parts: a subject, a predicate, and an object.

In the context of linked data, subjects and objects refer to individual pieces of data, such as people, places, or things. Predicates refer to the connections or associations between different nodes, such as the fact that a person is the painter of a painting.

In the figure above, the subject http://example.com/person/alice represents Alice, the predicate http://example.com/vocab/isAFriendOf represents the relationship “is a friend of,” and the object http://example.com/person/bob represents Bob.

2. Linked Data Event Stream

Linked Data Event Streams (LDES) apply- as the term implies — the linked data concept to data event streams. A data stream is typically a constant flow of distinct data points, each containing information about an event or change of state that originates from a system that continuously creates data. Some examples of data streams include sensor and other IoT data, financial data, and so forth.

LDES has several technical advantages:

  • LDES is a technical standard that allows data to be exchanged sustainably and cost-effectively across silos using domain-specific ontology;
  • LDES is one standard for both fast and slow-changing data;
  • LDES offers a solution for managing historical records, versions, and retention policies efficiently.
Photo by John Schnobrich on Unsplash

One can easily link datasets in a standardized way through linked data principles, giving enormous potential.

A Linked Data Event Stream is a constant flow of immutable objects (such as version objects, sensor observations or archived representation) each containing information about an event or change of state that originates from a system that continuously creates data.

3. Example of a published Linked data event stream

What does a Linked data event stream look like? We take an example of IoW (Internet of Water) sensor data published as a Linked data event stream to show an LDES in action.

This LDES stream describes the observations of water temperature and conductivity by multiple sensors spread across Flanders (Belgium). Like all LDES, it uses the TREE specification for its collection and fragmentation features. For the specific compatibility rules, read the TREE specification.

When reading the LDES endpoint, we start at the ‘main page’ or ‘starting point’ of the Linked Data Event Stream (ldes:EventStream). This so-called tree:view indicates a root node from where all members (fragments of data) can be reached. Here we can already deduce that we are dealing with a time-based fragmentation.

@prefix ldes:  <https://w3id.org/ldes#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix terms: <http://purl.org/dc/terms/> .
@prefix tree: <https://w3id.org/tree#> .

<https://iow.smartdataspace.beta-vlaanderen.be/water-quality-observations>
a ldes:EventStream ;
ldes:timestampPath prov:generatedAtTime ;
ldes:versionOfPath terms:isVersionOf ;
tree:view <https://iow.smartdataspace.beta-vlaanderen.be/water-quality-observations-timebased> .

Climbing deeper down the tree, we find tree:relation that shows a node with the first time-based fragment.

@prefix tree: <https://w3id.org/tree#> .

<https://iow.smartdataspace.beta-vlaanderen.be/water-quality-observations-timebased>
a tree:Node ;
tree:relation [ a tree:Relation ;
tree:node <https://iow.smartdataspace.beta-vlaanderen.be/water-quality-observations-timebased?generatedAtTime=2022-11-08T17:28:27.680Z>
] .

If we climb further down, we can find the first time-based fragment (generatedAtTime=2022–11–08 17:28). Important to note: although this fragment was generated on 2022–11–08, the observation data of the sensors were measured on a different time (e.g., 2021–05–14”). The time-based fragmentation is used to place the measured values of the observations in chronological order.

Here we can find all the relevant information of the measured data of the dataset:

@prefix ns0: <http://def.isotc211.org/iso19156/2011/SamplingFeature#SF_SamplingFeatureCollection.> .
@prefix ns1: <http://def.isotc211.org/iso19156/2011/Observation#OM_Observation.> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ns2: <http://def.isotc211.org/iso19103/2005/UnitsOfMeasure#Measure.> .
@prefix ns3: <https://schema.org/> .
@prefix sosa: <http://www.w3.org/ns/sosa/> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix prov: <http://www.w3.org/ns/prov#> .

<urn:ngsi-v2:cot-imec-be:WaterQualityObserved:imec-iow-ZibrvBsM4DQpYyDNSTXg3V/2021-05-14T11:46:08.000Z>
a <https://www.w3.org/TR/vocab-ssn-ext/#sosa:ObservationCollection> ;
ns0:member [
a <http://def.isotc211.org/iso19156/2011/Measurement#OM_Measurement> ;
ns1:observedProperty <https://data.vmm.be/concept/sensor/batterijniveau> ;
ns1:phenomenonTime "2021-05-14T11:46:08.000Z"^^xsd:datetime ;
ns1:result [ ns2:value [ ns3:value 3.250000e-1 ] ] ;
sosa:madeBySensor <urn:ngsi-v2:cot-imec-be:Device:imec-iow-kfoRZfEsBmK9Kuo4EpUBJm>
], [
a <http://def.isotc211.org/iso19156/2011/Measurement#OM_Measurement> ;
ns1:observedProperty <https://data.vmm.be/concept/waterkwaliteitparameter/conductiviteit> ;
ns1:phenomenonTime "2021-05-14T11:46:08.000Z"^^xsd:datetime ;
ns1:result [ ns2:value [ ns3:value 9.470978e+2 ] ] ;
sosa:madeBySensor <urn:ngsi-v2:cot-imec-be:Device:imec-iow-kfoRZfEsBmK9Kuo4EpUBJm>
], [
a <http://def.isotc211.org/iso19156/2011/Measurement#OM_Measurement> ;
ns1:observedProperty <https://data.vmm.be/concept/waterkwaliteitparameter/temperatuur> ;
ns1:phenomenonTime "2021-05-14T11:46:08.000Z"^^xsd:datetime ;
ns1:result [ ns2:value [ ns3:value 1.870000e+1 ] ] ;
sosa:madeBySensor <urn:ngsi-v2:cot-imec-be:Device:imec-iow-kfoRZfEsBmK9Kuo4EpUBJm>
] ;
dc:isVersionOf <urn:ngsi-v2:cot-imec-be:WaterQualityObserved:imec-iow-ZibrvBsM4DQpYyDNSTXg3V> ;
prov:generatedAtTime "2021-05-14T11:46:08.000Z"^^xsd:dateTime .

...

The above Turtle fragment describes parameters (e.g., battery level, conductivity, and temperature), their values and the definition of the used parameters (which can be found via the URI).

The tree:GreaterThanOrEqualToRelationpoints to the following time-based fragment in the tree. It becomes possible to read all the stream's data by following this link over and over.

<https://iow.smartdataspace.beta-vlaanderen.be/water-quality-observations-timebased?generatedAtTime=2022-11-08T17:28:27.680Z>
a tree:Node ;
tree:relation [ a tree:GreaterThanOrEqualToRelation ;
tree:node <https://iow.smartdataspace.beta-vlaanderen.be/water-quality-observations-timebased?generatedAtTime=2022-11-08T17:28:37.086Z> ;
tree:path prov:generatedAtTime ;
tree:value "2022-11-08T17:28:37.086Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>

4. LDES Client

To consume this LDES, a software application or component called an LDES client is used. LDES clients can process and analyze events published in an LDES stream, allowing you to extract valuable insights and perform actions based on those events.

A subsequent article on this topic will be published soon.

5. The advantages of linked data event streams

Working with data, one of the main challenges we face is the burden of harvesting data without or with insufficient metadata. Acquiring data from a wide range of sources and formats (from data dumps in zip files to various databases, services, APIs, etc.) makes it a terrible task to transform and convert the data in a structured, matching structure. Gaining no accompanying information about the data’s origin, context, or significance leads to misinterpreting data. In addition, working with outdated or incomplete information can lead to incorrect or misleading conclusions. As a result, there is a need for a specification where data is always up-to-date, metadata is included and can be easily linked with other data sets across the Web.

Linked data event stream is considered the core API that brings a solution to all these needs.

LDES increases the usability and findability of data, as it comes in a uniform linked data standard published on an online URI endpoint. LDES holds historical data and ensures that the dataset is always up to date.

As a result, the user is always in sync with the publisher’s dataset and can fetch all the historical data with its metadata.

In a nutshell, there are several reasons why there was a need to add Linked Data Event Streams as a specification:

  1. Linked Data is a powerful paradigm for representing and sharing data on the Web. Still, it has traditionally focused on representing static data rather than events or changes to that data.
  2. The use of event streams is becoming increasingly prevalent on the Web, as it enables applications to exchange information about changes to data in real-time efficiently.
  3. There was a need for a standard format for representing Linked Data events and changes so that different systems could easily interoperate and exchange this information.
  4. Linked Data Event Streams allow applications to subscribe to event streams and receive updates in real-time, which can be useful for various applications, such as real-time data analysis, event-driven architecture, and more.

6. Wrapping Up

Using linked data event streams has many advantages for data scientists and other data users. Some of the main benefits include:

  1. Real-time analysis: With linked data event streams, data can be analyzed in real-time, allowing for relevant real-time analysis and decision-making.
  2. Interoperability: Linked data event streams can be easily integrated with other data sources and systems, allowing for a more comprehensive and holistic view of the data.
  3. Flexibility: Linked data event streams can be easily customized and adapted to different contexts and applications.
  4. Improved clarification: Linked data event streams can help reduce errors and inconsistencies by misinterpreting the data, resulting in more reliable analysis.
  5. Increased data accessibility: Linked data event streams provide access to data in a standardized and machine-readable format, making it easier to access and analyze data from diverse sources.
  6. Improved data governance: Linked data event streams enable the tracking and auditing of data flows, facilitating better data management and governance.
  7. Scalability: Linked Data Event Streams can handle large volumes of data, making them suitable for big data applications.

Now you should be in good shape to know the basics of Linked Data Event Streams and their advantages. You can find more information on the websites underneath.

If you like what you read, be sure to ❤️ it — as a writer, it means the world. Stay in touch by following me as an author.

Contributors to this article are ddvlanck (Dwight Van Lancker) (github.com), sandervd (Sander Van Dooren) (github.com) at Smart Data Space (Digital Flanders, Belgium). In a rapidly changing society, governments need to be more agile and resilient than ever. Digital Flanders realizes and supervises digital transformation projects for Flemish and local governments.

--

--

Samuel Van Ackere

PhD - Data Scientist with a passion for geoinformatics and machine learning technics. Python and open source enthusiast.