Q&A with Nils about “Secure Data Spaces with senseering”

Everything you want to know about the data economy of the future

Nils van Ouwerkerk
senseering
10 min readFeb 15, 2021

--

Header. Image: © senseering | Semjon Becker

Co-Author: Daniel Trauth, Kristof Herrmann, Felix Mönckemeyer

Dieser Artikel ist auch auf Deutsch verfügbar.

Abstract

In the context of resilience, it is particularly relevant for manufacturing companies to collect data about their complete process chains. However, this often means that data or even process knowledge has to be shared with other stakeholders. This is only possible if there are secure dataspaces for this purpose.

With the MyDataEconomy, senseering has created the prototype of such a secure dataspace and addresses the pressing questions of what needs to be considered in a secure dataspace and how senseering has implemented this.

Here you can find more information about the MyDataEconomy, or you can experience it yourself.

Data Connection

What does the metadata model look like in the MyDataEconomy?

In the MyDataEconomy, metadata is governed by JSON objects (Java Script Object Notation). There, arbitrary metadata structures are allowed, as desired by the user. Each IoT device (worker) has a metadata object that can be customized as needed. In addition to this user-based metainformation, system-specific metadata is added, such as a timestamp to define the creation time of the data point, or the system ID of the data originator. In addition, to ensure data integrity, a signature of the data including metainformation is written to a distributed ledger.

Can arbitrary payloads be exchanged and how are the substructures documented for the data receiver?

The payload in the data transfer is arbitrary as long as it adheres to the JSON format. Any payload in JSON structure can be processed by the MyDataEconomy. However, our system is optimized for IoT data (for example, time series data that is continuously written in the same format). The structure of the data is defined using a JSON schema (data sources can also only write data to the MyDataEconomy that is permitted according to this schema). The marketplace is the component that handles any data transactions between two parties. There a sample dataset is randomly generated from the JSON schema when viewing the data sources, and serves as a pre-buy image of the data as product so that potential buyers can get an overview of the data structure without actually seeing the data. When a data purchase is made, the schema is first loaded from the marketplace. Then a peer-to-peer connection is established with the data producer. The schema then directly serves as a control schema for the incoming data when it is transferred.

Do you have experience with OPC-UA and what other data standards are there in the IIoT environment?

A few weeks ago, we connected a customer’s production machine with an integrated OPC-UA server to our system. We are familiar with the standard and can integrate it into our system, as well as other common standards, such as MQTT or HTTP.
In the future, we also want to realize a native integration of, for example, OPC-UA or MQTT into our system, at the moment this still runs via the detour of our connector (worker), which is based on websockets, a bidirectional network protocol based on TCP.
Generally we are interface/protocol agnostic as long as JSON files can be transferred over it.

What are the technical requirements for a network node?

A network node is a NodeJS-based open source software, which only requires a docker-capable runtime environment (for example, Ubuntu 20.04 LTS) with Internet access and the technical requirements of 20GB hard disk space and at least 4GB RAM. Otherwise, there are no further restrictions on the usability of the network node software. Thus, this software can run on an own server as well as at any cloud provider on a virtual machine. It is also possible to use small single-board computers with an ARM-CPU.

Data Sharing

Are you also taking the peer-to-peer approach or is the marketplace peering/clearinghouse?

When exchanging data, the marketplace can be used to search for appropriate data producers. There you get an overview of the data channels offered and descriptive information about them (without ever seeing the actual data). When a transaction is completed, a peer-to-peer connection is then established between the data producer and recipient, via which the data is then transported.
The marketplace itself is only used to find and connect data producers and no data is ever routed through it.
In addition, all transaction receipts are stored centrally in the marketplace and stored in the distributed ledger. Thus, reconciliation is possible in case of need.

Is there a functional overview of the data marketplace concept?

The architecture diagram of the MyDataEconomy shows that the system essentially consists of three components. The data sources are connected to a network node. This manages the incoming data and also has other functionalities in dealing with the data, such as visualization or even the provision of a data analysis environment.

Architecture image of the MyDataEconomy

The marketplace itself is the orchestration tool for its own network nodes, as well as a search engine for other users’ IoT devices and data sources.

In larger organizations with multiple users, the data marketplace also performs the function of user, role and policy management within the organization.

Do you have an example of a typical data structure and its semantics?

An example of an environment sensor data point might look like this:

{    temperature: -73557567.70839567,    humidity: 22750917.696623445,    pressure: -4404144.863966808,    magnet: {        x: -82743673,        y: -59384045,        z: 57014035    }}

The data is taken from the marketplace overview of this data source. The data point here is randomly generated and only gives information about the data structure behind this data source.

What are common data aggregation models that are compatible with your system?

There are basically two models available for a data purchase. First, it is possible to buy existing data from the data consumer in one transaction. This data is then transferred as a batch from the data producer to the data consumer. The other option is to subscribe to future data. As soon as new data ,that meets the purchase conditions, is fed into the system from the data producer it is streamed to the data consumer.

When feeding data into our system, the basic rule is that all data that is in JSON format is compatible with our system. Thus, both data streams, one-time data extracts or periodic dumps of data batches can be processed by our system. However, since each data source is equipped with a JSON schema, the system is optimized for equally structured data, such as IoT streams. Data can thus be arbitrarily modeled, aggregated and already evaluated, depending on the use case.

Data Policy

How do you create policy enforcement to preserve the data sovereignty of the data originator?

It is already possible for the data originator to write terms of use in a designated document for each data source, which is presented to the data consumer before sale. Due to legal issues, we have also decided to be able to provide policies for the use of purchased data, which can also technically be easily applied to the data consumer in an automated manner. However, since our network node software is open source, a kind of auditing body for network nodes could also ensure that the software of the recipient node guarantees the technical requirements for the implementation of data usage policies.

In addition, all data transactions are logged and also stored in the distributed ledger and thus the configurations of the data purchase (and thus also the policies) are traceable.

Which entities can track policy enforcement?

Policy enforcement is not yet implemented in the system in this way, but we are planning the technical implementation of an auditing instance that can check network nodes for policy enforcement methods if required. This auditing instance should be able to be requested by any user of the system. In the future, it should also be possible to process the transaction information from the distributed ledger for such purposes.

What mechanisms will be used to issue and implement sanctions?

Policy enforcement should initially be purely technical, so that the data consumer’s software checks and processes compliance with the policies. The aforementioned auditing instance also checks whether the network nodes comply with the latest security standards and software version. In addition to automated auditing of the network’s participants, there will be a manual reporting system where nodes on the network can report erroneous transactions or malicious behavior on the network.

Data Marketplace

Are there classification attributes for data in the marketplace?

So far, there are two levels of classification in our system. Data can either be private and only be searched for in the data marketplace by certain participants (e.g., companies in a data alliance or internal users of one’s own organization), or it can be public and thus also visible to every user and available for purchase. In the future, it is also planned to implement anonymized data on the data marketplace.

Which semantic standards are used to integrate external data structures?

So far, an external connection to the system is not possible, but this will be technically implemented in the near future if the connected data can provide a JSON schema.

How can the business model of a marketplace go hand in hand with data sovereignty claims?

The business model of a marketplace lives from the use of it, i.e., the exchange of goods. Regardless of data sovereignty, for many issues in today’s industry as well as smart home or city, it is imperative that IoT devices (or in other words, participants in the network) exchange data.

Data sovereignty now simply means that the transactions that are processed via the marketplace are provided with certain rules that allow the data originator to continue to manage his data in a certain way, even though it has now left his own server. In addition, our decentralized marketplace system provides data sovereignty precisely because the data does not have to leave the company’s own IT infrastructure until it is sold, unlike centralized architectures in which all the data of all users would be collected centrally. The data marketplace is therefore not really a marketplace where all assets are collected, but rather a kind of intermediary instance between two stakeholders, which mainly has organizational functionality.

Thus, the concepts of marketplace and data sovereignty are not only compatible, but they support each other if the marketplace represents a secure data space. In such a secure data space, the focus can return to the benefits of data sharing and asset sharing to create better data-driven services.

Data Processing

Is it possible for the data consumer to use third-party data analysis environments (e.g. hyperscalers)?

Yes, up to now it is possible to download the data from the database locally and thus transfer it to a data analysis environment of one’s own choice. In the future, it will also be possible to regulate this through policies.

Is it possible to execute specific ETL functions on the data producer side before receiving the data?

No, this is not possible without further ado. If the data is only available in a transformed state, the assistance of the data producer is required. However, if the data producer also keeps the raw data on the network node and performs transformations on the node itself, it is also possible to make the raw data available to the entire network.

So far, the transfer of data is limited to one data source per transaction, but we will remove this limit in the future. Thus, a data purchase process will be able to include several data packages from different data sources. The transformation of the data can then be performed at its own network node.

What are common deletion requirements for data consumers?

At the moment, we are aiming for policies for deleting data on the consumer side, which are, for example, time-limited. This means that it can be specified that the data is deleted from the data receiver’s network node after a certain period of time.

Outlook

We are currently working on the development of such a secure dataspace in the context of the SPAICER research project. This is to be used in particular to share complex production data securely with the partners in the project and thus to bring together the data producers as well as the data consumers. In addition, the benefit to a manufacturing company’s resilience will be explored using this secure dataspace.

senseering Logo | © senseering

About senseering

The senseering GmbH is a company founded in September 2018 that was awarded the RWTH Aachen University Spin-Off-Award. The core competence of senseering GmbH is the development and implementation of systems for the digitalization and networking of industrial and production facilities. Likewise, senseering GmbH advises on strategic corporate issues, in particular digital transformation, distributed-leger technologies, edge vs. cloud computing architectures for AI-based real-time control of industrial processes, digital business model innovation and the introduction of digital business processes such as home office, Azure or Microsoft365. Senseering is one of the winners of the first and largest AI innovation competition of the BMWi with the project www.spaicer.de.

Daniel Trauth (CEO) | www.senseering.de | E-Mail: mail@senseering.de

--

--