Technical deep dive #1: the Geora event sourcing architecture

Cadel Watson
Geora
Published in
7 min readSep 28, 2020

Background

Building a blockchain-based application in production comes with significant challenges when compared to traditional databases. As we continue to build out the Geora platform, we have encountered a number of hurdles when integrating Ethereum as our application’s “single source of truth”:

  • the immutability of a blockchain makes it hard or impossible to fix mistaken actions
  • read and write performance to the blockchain is significantly worse than a traditional database
  • Ethereum and Solidity lack expressiveness when querying complex data

Traditionally, applications have mitigated these issues with a hybrid model which combines a standard database and a blockchain. In this model, writes and reads are performed directly against the database, and periodic “snapshots” of the database state are propagated on the blockchain. This allows participants to verify that their actions were processed correctly, while maintaining acceptable performance in the application.

But this approach has a significant flaw: the “source of truth” for application data is split between the blockchain and database, leading to centralization and making it difficult to reason about transactional and atomic updates. In addition, participants can only verify that their updates were processed successfully after-the-fact, rather than lending their support as part of a processing network.

At Geora, we weren’t happy with this trade off, and decided to find a better solution.

What is event sourcing?

Popularized by Martin Fowler, event sourcing is an approach to application state which models data changes as an event stream.

The fundamental idea of Event Sourcing is that of ensuring every change to the state of an application is captured in an event object, and that these event objects are themselves stored in the sequence they were applied for the same lifetime as the application state itself.

From this event stream, the state of the application at any point in time is derived.

Consider a simple ERC-20 token. The state of the token is a map of user addresses to balances, like so:

0x84057185577707C5C6FE7F744fc31719ad50dC66 => 50
0xdd2c5e15B6A87FCE740E20C6dAF41B5cEeA7DbC9 => 100
0xC75F72F00ED7adbDE45B2a024572c714ca37D342 => 0

In a traditional database, this state is mutated when an update is made. In this example, a transfer of 50 tokens from 0x84057185577707C5C6FE7F744fc31719ad50dC66 to 0xC75F72F00ED7adbDE45B2a024572c714ca37D342 is performed by setting the new balance of the two addresses in question within a transaction, resulting in a new application state:

0x84057185577707C5C6FE7F744fc31719ad50dC66 => 0
0xdd2c5e15B6A87FCE740E20C6dAF41B5cEeA7DbC9 => 100
0xC75F72F00ED7adbDE45B2a024572c714ca37D342 => 50

In an event sourcing system, the state is never explicitly mutated — it can be considered a function of the events processed up to a particular point in time. Our ERC-20 token might instead be modeled by:

1 SetInitialBalance(0x84057185577707C5C6FE7F744fc31719ad50dC66, 50)
2 SetInitialBalance(0xdd2c5e15B6A87FCE740E20C6dAF41B5cEeA7DbC9, 100)
3 SetInitialBalance(0xC75F72F00ED7adbDE45B2a024572c714ca37D342, 0)
4 Transfer(0x84057185577707C5C6FE7F744fc31719ad50dC66, 0xC75F72F00ED7adbDE45B2a024572c714ca37D342, 50)

To obtain a snapshot of the derived state at any point in time, we run the events up to that point. So at t = 2 , the state has the first two events applied, and so on.

Event sourcing and blockchain

Event sourcing is a good fit for a blockchain because the event stream should be immutable (at least for stricter definitions of event sourcing)¹. If the blockchain is responsible for producing the event stream, other systems can consume the stream and use it to derive data.

This allows for a clean separation of concerns. We can use the blockchain for its strengths (trustless transaction processing), and traditional databases for theirs (expressive and fast queries).

At Geora, our architecture looks like this:

Simplified Geora architecture

When a user writes to the application, they call our API to send a transaction straight to our private Ethereum network. Our smart contracts encode all business logic of the platform, meaning that these transactions are verified by all network participants before the write is confirmed — ensuring that trust in the application is decentralized.

Based on the result of the transaction, the network emits zero or more events to an event stream. Our PostgreSQL database processes the event stream into materialized tables prepared for efficient queries.

When a user reads from the application, the API queries against the PostgreSQL database, allowing it to use the expressiveness of SQL to its full effect. Importantly, this database is a function of the blockchain’s event stream, and isn’t directly mutated by the API. This means that it’s technically possible for any participant in the network to run their own query database that consumes blockchain events².

Implementation

This is all great in theory, but implementing the event sourcing system came with more difficulties.

Event stream reliability

The first of these is reliably emitting events from the blockchain as they are processed. Our business events are all modeled as Solidity events. Consider this simple key-value storage contract:

pragma solidity ^0.6.7;contract KV {
event ValueChanged(bytes32 indexed key, bytes32 indexed value, address indexed by);

mapping(bytes32 => bytes32) internal store;
function set(bytes32 k, bytes32 v) public {
store[k] = v;

emit ValueChanged(k, v, msg.sender);
}

function get(bytes32 k) public view returns (bytes32) {
return store[k];
}
}

When a user sets the value of a key, a ValueChanged event is emitted. This will give us a stream that looks like:

ValueChanged("foo", "bar", 0x84057185577707C5C6FE7F744fc31719ad50dC66)
ValueChanged("foo", "baz", 0x84057185577707C5C6FE7F744fc31719ad50dC66)
ValueChanged("red", "green", 0xC75F72F00ED7adbDE45B2a024572c714ca37D342)

To actually get these events from an Ethereum node, there are two existing options:

  • use the eth_getLogs RPC method to get logs emitted within a certain block range
  • use the Hyperledger Besu pub/sub API to listen for events over a websocket

Both have disadvantages. eth_getLogs requires polling, which is prone to performance issues and missing events, while the pub/sub API works well but requires difficult connection logic and extreme high-availability to ensure no event is missed.

We needed to find a method that treated these events with the importance they deserved in our system — since they represent the entire state of our application, we decided that we would go so far as to prevent blocks being processed if there was a failure to emit events.

With that requirement in mind, we ended up developing a plugin for our Hyperledger Besu nodes which adds event streaming to the internal block processing mechanism. For each new block, the plugin decodes the event logs (and handles things like indexed and hashed arguments) and performs a transaction into our database to insert the logs. For high availability, the database logic uses ON CONFLICT constraints to allow any number of nodes to write identical logs at once — so we can have multiple watchers in the case that some nodes fail.

Finally, we had a reliable system to stream events to our database.

Efficient querying

Getting back to the key-value store contract above, we now had our logs in PostgreSQL in a table that looks something like this²:

The next challenge was allowing users and our API to efficiently query this table. Say that we want to query the contents of the K-V store at each block. We could do that in a single view using an SQL window function and other complex queries, but as the number of recorded events increases into the hundreds of thousands and millions the view will slow to a crawl. In addition, PostgreSQL’s query planner struggles with the many nested views required to transform event logs into a more queryable structure.

A better way to handle the event data is to materialize it into tables which are optimized for the query you want to perform.

CREATE TABLE demo.kv_at_t (
block INT NOT NULL,
key TEXT NOT NULL,
value TEXT NOT NULL,

CONSTRAINT block_key_uniqueness UNIQUE (block, key)
);

Then, use a trigger function to incrementally update this table when relevant events are inserted into the database:

CREATE OR REPLACE FUNCTION demo.event__value_changed() RETURNS TRIGGER AS
$f$
BEGIN
IF NEW.event != 'ValueChanged' THEN
RETURN NEW;
END IF;
-- Copy the previous block's K-V pairs
IF NOT EXISTS(SELECT * FROM demo.kv_at_t WHERE block = NEW.block) THEN
INSERT INTO demo.kv_at_t (block, key, value)
SELECT NEW.block, key, value FROM demo.kv_at_t WHERE block = NEW.block - 1;
END IF;
-- Update based on this event
INSERT INTO demo.kv_at_t (block, key, value)
VALUES (NEW.block, NEW.params->>0, NEW.params->>1)
ON CONFLICT ON CONSTRAINT block_key_uniqueness DO UPDATE SET value = NEW.params->>1;

RETURN NEW;
END;
$f$ LANGUAGE plpgsql;

CREATE TRIGGER event__valued_changed_tgr
AFTER INSERT ON demo.events
FOR EACH ROW EXECUTE PROCEDURE demo.event__value_changed();

After applying this concept to our application we dramatically improved the speed and efficiency of reads. It’s important to remember that materializing the tables is just incremental application of the event stream to application state function — if we were to lose the entire database, we could easily rerun the event stream through the trigger functions to rebuild our query tables.

Conclusion

Despite the many challenges, at Geora we have successfully launched an in-production blockchain application that delivers fast writes and reads in a secure, decentralized way. The key to our platform is the application of event sourcing, which enables us to perform all business logic on-chain while at the same time keeping reads and queries efficient. This design allows us to use the blockchain for what it’s great at: as a single, verifiable, and trustless source of truth for supply chain data.

To get started developing on Geora today, sign up for the developer portal today for free at app.geora.io/developer.

Want to keep up to date with everything happening at Geora? Sign up for our newsletter.

[1] It’s important to take into account the consensus mechanism for the blockchain when considering the immutability of the event stream. For a consensus algorithm that allows multiple chain heads, like proof-of-work as implemented by Ethereum and Bitcoin, there may be multiple competing event streams at any one time. However, Geora uses the IBFT2.0 algorithm which guarantees finality of each block, ensuring there is only one consensus ordering of events.

[2] Note that this is very simplified for the sake of explanation. In reality, you would want to consider things like normalized event IDs, transaction hashes, indexing, etc.

--

--