Using streams to bring data consistency to blockchain

Chase Smith
Published in
6 min readOct 5, 2021


The issue of data consistency goes beyond the scope of blockchain. Consistent data is required to have a deterministic global state in any software application, and represents one of the struggles of blockchain. In blockchain we want to represent data as a real source of truth, but the manner in which data represented and accessed to the developer prevents this from happening.

While an Ethereum client allows individuals to subscribe to information from blockchain, these subscriptions are 100% not consistent. If you want to process blocks in Ethereum, you will run into this issue often, and deal with inconsistencies that can happen within minutes.

import Web3 from “web3”; 
import { BlockHeader } from “web3-eth”;
async function main() {
const web3 = new Web3(new Web3.providers.WebsocketProvider(“wss://<project-id>”, {}));
let prevBlock: BlockHeader;
web3.eth.subscribe(“newBlockHeaders”, (err, block) => {
console.log(block.number, block.hash, block.parentHash);
if (prevBlock && prevBlock.hash != block.parentHash)
console.log(!!!! Inconsistency);
prevBlock = block;

The code snippet above will result in inconsistencies within 10 minutes of running.

For instance, you can get a confirmation that a transaction has been processed on block 1000, but when you check block 1000 later the transaction is not included because of a fork. The only way to identify a block is through blockhash, not block number, this means that the same call to the Ethereum RPC node can give different blocks and data. As a developer this needs to be addressed, which makes the source code more complicated and buggy.

One of the ways to address this issue is by preserving data in immutable streams. Immutable streams have the ability to create a consistent data layer that can be replicated on a large number of devices. With streams we can build a state at any point in time (e.g. at block hash xxxxxxx from block number 1000), also have built-in functionality to audit the state and the messages. Since the data is consistent, we can connect the blockchain to well-established tools like Postgres and Time-series databases, as well as plug-ins to machine learning tools and analytics frameworks. We are connecting different data sources which enables the transformation and replication of data into other protocols and plug-ins. This is designed to follow the core tenants of the reactive manifesto.

“Systems built as Reactive Systems are more flexible, loosely-coupled and scalable.“

— Reactive Manifesto [1]

The future for blockchain evolution is reactive, this means that blockchain needs to adopt a message-driven approach processing and transforming its data. In order for this to occur, we need a performant and scalable read layer.

We want to access data in different ways, to perform joins, aggregations, transformations, these require a general layer that is built towards these scalable services. The read Layer is the solution to have highly scalable application on the blockchain [2]. A performant read layer is a general challenge for all blockchains because nodes lack resources to manage data requests. Even if Ethereum full nodes could execute any query data, it would still be an issue for node-size and request scalability. Blockchains are huge and application-specific data needs to be filtered, transformed, replicated, and accessed in an arbitrary and scalable manner. We need to be able to filter out and replicate application-specific data in order to have useful and performant systems.

Proxima builds a scalable read layer by extracting and moving blockchain data, then indexing this data [3]. We take blocks from Ethereum blockchain, process them by block number and then push them to an external data layer. In order to maintain consistency, we have to check that the block hash and parent hash coincide. The main idea is create a consistent append-only stream that is immutable. Our streams are the framework and tools to achieve eventual consistency for a wide-range of applications. One analogy for this method is through the replication of data, changes need to be applied to other replicas, these replicas will be consistent with the master if they perform the operations.

This is explained better in a great article, The Duality of streams and tables, which discusses how how streams represent updates, and tables represent state. Using streams we can generate tables, and using tables we can generate a stream of changes. With this approach we can even perform operations on streams, and then represent them as tables. In the case of a generalized DEX, streams can be made with relevant data through streams representing transforms, filters and merges [4]. If DY/DX wants to have guarantees on the consistency of the data for their frontend, they can do this by having a consistent stream of data replicated from the blockchain.

These streams of data will remain eventually consistent, but blockchains are not. In the case of forks, we deal with them as rollbacks, and represent them as a series of undo blocks. In the case of a fork and rollback, the main problem that we have to deal with is deciding if we should undo block a block, and which block to undo.

We handle these rollbacks during the commit process when we receive new blocks from a full node and order them into streams.

  • Check if block is the next block (parent hash matches prev block in stream). If it matches add to block stream. Otherwise create undo block(s)
  • Find the closest common parent of latest committed block, common parent is the latest block in the current fork of the blockchain.
  • Produce undo blocks for every block that is in the committed state in reverse order. Common Parent. Current Committed Block. Current Fork Block (to be committed). Want to undo the blocks between common parent and current block, in reverse order.
  • Commit blocks from common parent to current fork block

As a result of our data streams it is possible to create a data layer for any blockchain. Having a data layer for a blockchain is a problem in every ecosystem. Our partner DEIP is a project that has dealt with these problems on the Substrate framework, while building their chain [5]. When DEIP started developing on Substrate there was no infrastructure for multi-account transactions, nor was their any method for streaming events from the node.

DEIP is A multi-chain protocol for tokenisation and governance of creator economy assets. It enables on-chain discovery, evaluation, exchange and licensing of intangible assets.

So DEIP team had to implement it themselves and created Event-Proxy for the Substrate development of domain-specific chains. Together with the DEIP team, we aim to build on top of this Event-Proxy implementation and build fully functional integration of the Proxima data-layer solution for the Substrate framework, so any Substrate-based chain will be able to plug it as a module. This will pave the way for the scalable replication of the read-layer, and enable 1 billion connected applications, with similar costs to the traditional model.