Ensuring data authenticity and integrity on the Streamr Network

Melchior Thambipillai
Streamr
Published in
4 min readJun 3, 2019

Data points published to a stream are sent to the Streamr Network, which transmits them to stream subscribers. We want to ensure that the data points being subscribed to are the same as the data points which have been published, meaning that they haven’t been tampered with (integrity) and that they were truly created by a valid publisher (authenticity).

These properties can be ensured in transit between two parties by TLS, but that’s not sufficient. Any party able to read the data in clear text could modify it, adding a trust requirement to the system. In a decentralized network, intermediate nodes can never be trusted. From the perspective of building such a network, we wanted to completely eliminate this trust requirement by adding the ability to cryptographically sign individual data points.

Moreover, every message holds a reference to the previous one. That reference is signed as well, effectively giving us chained sequences of messages that are tamper-proof.

Adding a signature to every data point

In my previous post, I introduced how Ethereum users can login by signing a challenge with their Ethereum private key. Now to publish a signed data point, Ethereum users can similarly produce a signature for each data point and send this signature along with the actual message content.

Publishing signed data points can be done easily with either the JavaScript library or the Java library, which are the first libraries to support this new feature. In the default publishWithSignature:'auto' mode the client will publish signed data points if an Ethereum private key is provided.

const client = new StreamrClient({
auth: {
privateKey: ‘YOUR-PRIVATE-KEY’,
},
publishWithSignature: ‘auto’ // or ‘always’ or ‘never’
})

Each data point has several fields that need to be signed:

  • The stream ID on which the data is published
  • The timestamp/sequence number for the data point ordering
  • The publisher ID that published the data
  • The reference to the previous data point
  • The data itself

If one of these fields is not part of the payload to sign, the scheme would not resist against replay attacks. For example, assume the stream ID is not part of the signed payload. In this case an attacker could eavesdrop data points published on stream 1 and re-publish them on stream 2. The signature would be the same and still be considered valid. So for every signed data point, all these fields are used to compute the signature. This ensures that no replay attack is possible because the combination of these fields uniquely identifies the data point over time across all streams and all publishers.

Verifying a data point’s signature

On the consumer side, data subscribers can now receive signed or unsigned data points and they will need to decide when to accept/reject them. The JavaScript library and the Java library implement this feature.

In the default 'auto' mode, signed data points are accepted only after signature verification. To know if an unsigned data point was supposed to be signed, the subscriber requests the stream’s metadata from the Streamr API to know if data points on that particular stream are supposed to be signed. If it is the case, the subscriber expects a signature to be present and rejects any unsigned data points.

const client = new StreamrClient({
verifySignatures: 'auto' // or 'always' or 'never'
})

Now to verify a signature, the subscriber can recover the Ethereum address corresponding to the private key that signed it (thanks to the EC recover operation) and request the set of trusted publishers for that stream from the Streamr API. If the recovered address is part of the set, the signature is considered valid.

Does signing impact performance?

As long as time to compute/verify the signature doesn’t exceed the interval between two data points published/consumed, then data points can still be published/consumed at the same rate. But if an IoT sensor publishes thousands of data points per second, signing each one of them can be an issue because the sensor might not have time to compute the signature before the next data point needs to be published. In this case, data points to be published will get temporarily stored in a buffer whose size will grow indefinitely until a capacity overflow occurs.

We ran tests that showed that a MacBook Pro 2018 can compute about 1200 signatures per second on one core. While this should be enough for almost every stream, we designed another scheme in case of very high production rates: The data points could be grouped in batches and the publisher only computes one signature per batch. The downside is that subscribers need to wait to receive the whole batch before being able to verify the signature. We might implement this scheme in the future if there is demand for it.

A step towards decentralization

With data signing, subscribers don’t need to trust the network anymore to be convinced of the authenticity and integrity of the data points they consume. As intermediate nodes cannot be trusted, this signing scheme will be especially important in the Streamr P2P Network as it starts to decentralize. Now that we have authenticity and integrity, in the next blog post, we will tackle data confidentiality with multicast encryption.

--

--