Applying Blockchain to Natural Capital Markets
Event Sourcing for Public Ethereum Applications
Implementing an Event Sourcing architecture allows more highly-optimized systems and even third-party applications to participate as a distributed application ecosystem.
Ethereum is the dominant blockchain platform for smart contract applications, routinely called Dapps. Many of these applications are simple, handling only minimal transactions, with little need for storage or extensive data retrieval. However, more robust and feature-rich applications will inevitably need to implement increasingly complex features and find themselves facing the same obstacle.
Ethereum operation in general
This means that even seemingly trivial functionality such as sorting actually requires large amounts of custom code. The code generated may have unpredictable or unexpectedly high gas costs, especially as more data is stored and needs to be iterated over time.
These types of actions are commonly called traversal, operations that loop and act on each entry in a list. They are exceptionally difficult and expensive within the EVM, yet they are fundamental types of data usage in more typical applications. Any form of sorting, filtering, or aggregating data requires extensive traversal.
These operations and functions are required to create the kind of rich interface modern web users expect. Developers can’t compromise on delivering fully-featured software due to limitations in the storage mechanisms that underpin it.
It’s not good enough to just say it’s too hard. We need to find solutions, and that can mean innovating or thinking about the problem in a very different way.
A prime example is graphs. These often require complex or custom aggregation as well as results filtering to show data between arbitrary or moving periods.
While smart contracts on the EVM are an excellent mechanism to authorise and validate data, they are actually a very poor mechanism to retrieve that same data.
A potential solution to this problem
The solution lies in splitting reading and writing into two loosely coupled applications.
By splitting the reading and writing processes, developers are able to optimise the software for each aspect of the platform. These two systems can be connected via an Event Sourcing pattern, where the smart contracts broadcast events that are received and actioned by a standard web API.
This allows the system to retain the data integrity, rule-enforcement, and transparency provided by the smart contracts while facilitating rich data access from an API.
Distributed applications vs “Dapps”
This conflicts with the definition used in the wider IT community, which refers to distributed architecture as systems patterns designed around loosely connected components or systems that interact minimally and at specific boundaries. Microservice architectures are a good example of a distributed application.
So to be more explicit, when I refer to “distributed,” I’m referring to this sort of IT architecture rather than the more niche understanding of a Dapp.
This is not a new or unique idea
Some interesting prior work on this already exists. Just as we were preparing to implement this process, another blockchain company published a very similar pattern.
This was of particular interest because Geora are well known to us. At least one member of our core dev team has worked on both platforms, and our CEOs are friends.
While the solution is similar, the implementation is wildly different. It’s worth noting the simple facts though: they clearly came to similar conclusions because they were dealing with similar problems.
Ethereum and Solidity lack expressiveness when querying complex data
Where Geora’s solution differs is primarily in the technology used. Geora is a private chain running on Hyperledger Besu and the event source is a much more advanced integration running as a plugin for the IBFT2.0 consensus algorithm. What we’re advocating here is a simple structural pattern that can be implemented for a public Ethereum-connected application.
The Event Sourcing Pattern
Event Sourcing is an architectural design pattern largely derived from the world of microservices. In a microservice architecture, each “service” or area of responsibility has its own independent infrastructure and features. They are “decoupled” and cannot communicate in any way. This means that to share data they have to trigger the emission of events from a source. All of the relevant services then subscribe to the event emitter and action any relevant generated event to their specific piece of the action.
In a true microservice system, the events are stored as atomic changes which can be replayed to any point. These changes then form a temporal store of the state of the application as well as at the current time. The cumulative effect of all of these changes is the current data state. Think along the lines of Git, which can be read at any point along the commit path.
To facilitate reading of the current state, these effects can be run and cached, which is called a projection. Again, in Git terminology you can think of that as a checkout.
This is the core principle in the Event Sourcing approach. The smart contract events are received by external storage and a projection is maintained, updated or generated more data-intensive applications can access it.
Note that this is not intended to be an exhaustive or comprehensive analysis of the Event Sourcing pattern. It is merely a context for the origins and intent of the implementation recommended here, to describe its applicability to the problems Dapps and EVM-based solutions face in general.
The proposed and implemented approach differs from traditional Event Sourcing in several key aspects. Event Sourcing is typically used in an internal infrastructure, where the messages are not publicly visible. This approach is completely public. Additionally, there is no attempt here to capture the temporal storage mechanism event sourcing uses, though that could, of course, be implemented.
Lastly, the event protocol the Ethereum Virtual Machine uses is a simplistic event firing. By comparison, enterprise Event Sourcing solutions use robust libraries such as Kafka and support features like message receipt confirmation and send retries. The proposed approach is more optimistic and could require additional mechanisms to ensure data consistency, such as daily reconciliation processes.
Command Query Responsibility Separation Pattern
Event Sourcing of any kind requires a distinction to be made between the systems writing the data and the systems reading it. The emergent Command Query Responsibility Separation (CQRS) requirement is a primary practical benefit of an event-sourced system.
The rationale is that for many problems, particularly in more complicated domains, having the same conceptual model for commands and queries leads to a more complex model that does neither well. — Martin Fowler, CQRS
To clarify that, despite having a long acronym, CQRS is just the principle of separating read processes from write processes, ensuring that methods (or entire applications) only do one or the other of those things. They never attempt to do both.
A Command writes data but may not return it. A Query reads data but may not write it.
The systems reading can be highly-optimised around that responsibility, while the systems submitting or confirming data can be optimised around that need. This especially applies to smart contracts, greatly reducing “getter” data access code, mainly code that might need to traverse, filter, or sort.
Confirming the Need
Event Sourcing is actually a relatively complicated architectural pattern. Though intuitive enough, it creates a number of “moving parts” which introduces design complexity and potentially infrastructure costs.
The proposed implementation only provides benefits under all or some of these specific assumptions.
- There is an explicit requirement to use a smart contract to enforce rules or permissions.
This might seem obvious but the pattern requires using smart contracts as an event source. Some of the benefits could be more easily obtained by a standard REST or similar API or by implementing a bespoke or off-the-shelf event emitter. This approach would eliminate gas costs, deployment management and other considerations.
- At least one “client” application needs richer data than the smart contract is capable of handling
A self-contained or simple smart contract would not benefit from this architecture. It is fundamentally about orchestrating data between distributed systems.
- There is a requirement to provide public visibility and transparency on data actions
Most Event Sourcing implementations are about providing a data solution within a closed system. This approach, instead, provides a data source that is entirely public. This means any interested party can make projections of the data or subscribe to specific events.
- There is a requirement to facilitate interoperation within multiple client applications
The key of the Event Sourcing approach is communication between distributed applications. EVM-based Event Sourcing publicly broadcasts the event, meaning it can be used to communicate with third-party or external suppliers or partners. This may provide a simple but powerful integration mechanism as it does not require a third party to change any internal systems to support the data, merely the ability to “translate” or understand the event details.
Systems that meet these criteria may benefit from an Event Sourced approach, which will allow them to separate read and write requirements, and implement a robust system of projections and events.
Identifying the entities
The first step to progressing to an event-sourced model is to ensure that the entities created within the blockchain can be uniquely identified. This is counter to most practises within smart contracts, which store data in arrays and then address them by index. However, the latter approach is error-prone and difficult to maintain across multiple systems. A more deterministic method is preferred.
While it would be possible to create an ID within the database (and is best practice to do so in typical systems) that does not create an independent and platform-agnostic way of identifying a given entity. It also creates an undesirable circularity. The event would need to be emitted, then have an ID generated which then updates the entity. But to which entity?
The optimal solution is for the entity to generate its own unique identifier, which can then be shared by all systems to concretely and confidently address all functionality. This ID can be used to reconcile changes made to versions in their own projection.
An example implementation in Solidity is a combination of keccak256 and abi.encode on the details of the entity to be created as well as the blocktime.
This then creates a bytes32 value which may be cast to a bytes16 to facilitate struct packing techniques as listed in . Note that the details used must provide sufficient uniqueness to prevent duplicate IDs if mined in the same block.
It is critical that, for every action that could possibly change the state in a client system, an event must be emitted. Additionally, it must also contain all of the information necessary to identify the relevant entities and the changes that need to be reflected.
This can have a significant downside. In order to provide comprehensive information that might be needed, the events need to emit a large number of arguments. The gas costs of additional arguments in an event are very high.
Every system will require a different set of events to ensure data consistency. However, there are two key options.
One is a generic event, such as
DataUpdated, that then provides further details about the changes. This has the benefit of a generically-handled event, but it might not be predictable what parameters are required.
Additionally, multiple changes might require multiple events and the creation of new entities will be difficult to support.
The alternative is to use unique custom events. These will map more closely with the existing code and domain but each potentially-relevant action needs to be explicitly handled with its own event. This helps to minimise the gas used, as the event name conveys significant information.
event OrderDeleted(bytes16 id);
There is no further information required of this event. It contains all the information any client system needs to know in order to determine steps to take. Whether that means the addition of a deletedAt flag, removal from the database or whatever action is relevant.
As most of the events in many systems are status changes, this pattern is repeated. The exception is the creation of a new entity, which needs to provide clients with all of the information for them to create their own internal data structures.
address indexed user,
It is crucial is that all state-changing functions emit at least one (and potentially multiple) events. It must not be possible for the client databases to get out of sync with the smart contract state.
Listening to events
Once the event is emitted it is up to the clients to determine their own level of interest or interoperation with the transmitted data. As the data is a simple web socket connection, it shouldn’t be difficult to establish in any client that can support the JSON-RPC standards used for blockchain nodes.
Tools such as Web3 or EthersJS trivialise this and using them requires only knowledge of the ABI and address of the deployed contract.
Not all clients will need to keep a comprehensive real-time tally of all application states. A government regulator might merely want to keep a list of all trades or output a report.
As well as real-time event-logging, the smart contract keeps a convenient record of all previous events, which can be requested at any time. This allows regulators and similar parties to simply request the events they are interested in at the end of the period.
Implications and compromises
The appropriateness of this approach will vary by project tolerances. It is largely untested at scale and the ID generation mechanism, in particular, might not suit high-volume transactions as sufficient entropy will be harder to generate.
The use of independent read and write data sources may create current state ambiguity in the same way any eventually-consistent approach will.
There may be an increase in overall gas costs for some (or all) transactions, as gas-costly event emitting is extensive.
Enterprise Event Source solutions provide mechanisms for ensuring data consistency and confirming all events were actioned, were only actioned once and were actioned in order. Those mechanisms require explicit implementation in this model.
There is an overall increase in system complexity, depending on the systems being compared. In particular, the reduction in connection between these components may affect testing, as there is no longer a “round trip” to test, only a boundary at the EVM layer.
A concrete case study
Water Ledger trading platform
The Water Ledger trading platform was designed and built by Civic Ledger as a system for trading temporary water allocations on an open market.
The original design was solely built using smart contracts. This meant all functionality was managed by contract, using a collection of smart contracts to store various entities such as the outstanding order book, completed trades, licence details, balances and more.
Unfortunately, the ongoing addition of ostensibly simple functionality required surprisingly challenging levels of code and complexity. An example was the need to retrieve the orders and trades for a given user. While there was already code to get all of the orders and trades, there was no specific filter.
Implementing this feature alone added 135 lines of near-duplicate Solidity code, an additional 30%+ added to the line count of the two primary storage contracts.
The addition of a design requirement for graphing historical trades prompted deeper study. Graphing data is a more complex need than it may at first appear.
The intuitive requirement to populate a candlestick-style graph as used in trading platforms means a rolling 12 month mean, maximum and minimum — both an aggregate and a filter combined. This was clearly not feasible using the existing contracts and we needed a better solution for our increasingly complex data requirements.
The logical solution was to expand the simple API, primarily used to serve ABI and address details, to support a full projection of the trading platform entities, splitting the system along established CQRS principles.
Water Ledger met the requirements listed above. The use of smart contracts for data insertion allowed the enforcement of rules for water trading markets, and the existence of the data in the smart contracts enabled the transparent “trustless” experience necessary in the market. Additionally, there was a desire to foster interoperability of both systems and data with multiple external entities, such as regulators and other water markets.
All data was stored in smart contracts, making for extremely complex and code-heavy changes to add even trivial features. The requirement to pull aggregate data for graphs was almost impossible to achieve. As a trading platform, this sort of data was clearly necessary.
However, it was also not appropriate to simply move data to a standard database. The water industry in Australia is highly controversial and any effective solution requires total transparency. This “trustless” approach obviates smart contracts. Additionally, the ability to encode business rules in a publicly visible and auditable mechanism provides a benefit in terms of public confidence and regulatory compliance.
The critical changes described above were needed for Water Ledger to implement an Event Sourcing pattern. First, each entity created, primarily trades and orders, needed to be identified with an ID.
This significant change to the approach led to major changes in the Water Ledger order book. Instead of two large arrays of BuyOrders and SellOrders requiring traversal, it became possible to consolidate them to one single array of orders and to then traverse two simpler ID arrays, one for the buy orders and one for the sell orders. These changes, alone, eliminated more than 110 lines of duplicate smart contract traversal code in the order book contract. This was, in part, because the “available sell orders” array (for example) was always of known length and did not need to be repeatedly recounted for the iterator.
This further enabled the removal of 312 lines in two smart contracts, approximately halving the line count for these two business-critical files.
Frontend application changes
Water Ledger’s data storage was solely restricted to Ethereum smart contracts. The trading platform also had an interface written in React as a Single Page App (SPA) that served as the mechanism for users to interact with the contracts, adding trades and displaying the existing order book details.
This meant that data was added to the order book by the interface, and then the state was re-requested and updated in the application. This was already done by watching the events emitted by the order book smart contract and triggering a re-request.
This change allows the dashboard to request from a standard REST API instead, facilitating access to a wider range of data such as the aggregates needed for graphing and for fast and efficient filtering and sorting. This also allows for more standard data access patterns to be used to implement more mature and feature-rich caching strategies.
The pattern for data access changes significantly because while data is being read from the API, it is still being written to the smart contracts as defined in the CQRS pattern. As a result, the data access process is now circular.
An order is sent to the order book smart contract. The contract emits an event containing the details. The API receives the event and stores the details in its database. The dashboard updates from the API. This flow can be seen in a more detailed diagram than the one above.
It’s worth an aside here. This part does not technically need to be circular. It would be possible in both theory and practice to have the API
POST orders/ endpoint added to the database. It is more intuitive to do so, rather than have it wait for an event. However, a big part of the goal of this application architecture is that there is no possible way to add data to the database except via smart contracts. No side effects or back doors. Keeping that architectural purity is important.
The addition of standard features for a web application necessitated this architecture change but this created a positive feedback loop where the change to the architecture also allowed significant redevelopment of the React dashboard application.
Most critically, it becomes possible to implement a data retrieval and caching strategy that is less complex, more standardised, and relies solely on industry standard protocols. Rather than JSON-RPC through often-arcane libraries, data can simply be requested as a standard HTTP request, using REST, gRPC or GraphQL.
In Water Ledger, that meant updating the dashboard to remove some of its most complex code — Redux application state management using asynchronous data calls — with a single line HTTP calls and caching solution called React Query.
This resulted in a net removal of 295 of some application lines, including 311 lines of the system’s most complex and maintenance-intensive code.
By implementing an Event Source approach, Water Ledger was able to reduce the code line count on core smart contracts by 312 lines, reduce the React application by a net of 295 lines and reduce complex state management code by more than 314 lines.
However, these line reductions do not comprehensively cover the reduction in overall complexity and the improvements in pattern and approach readability.
Ethereum smart contracts are a powerful mechanism for authorizing and approving transactional data but provide a limited interface for complex data retrieval.
Implementing an Event Sourcing architecture allows more highly-optimized systems and even third-party applications to participate as a distributed application ecosystem.
It becomes possible to create drastically simpler yet higher performance sub-systems that combine the security and data opacity of smart contracts with the data richness and high performance of a traditional API. Additionally, there are important implications for interoperability, which is vital in emerging government, compliance and regulatory systems.