Designing A Simple Event Source

Ranjana Jha
7 min readAug 12, 2021

In this post , I will put forward my approach to design an event source using PostgreSQL and/or DynamoDB. We can select either one depending on our use case (assuming here — we know when to use a Sql Db and when to use a NoSql Db).

Definition :

We can query an application’s state to find out the current state of the world, and this answers many questions. However there are times when we don’t just want to see where we are, we also want to know how we got there.

Event Sourcing ensures that all changes to the application state are stored as a sequence of events. Not just can we query these events, we can also use the event log to reconstruct past states . The fundamental idea of Event Sourcing is that of ensuring every change to the state of an application is captured in an event object, and that these event objects are themselves stored in the sequence they were applied for the same lifetime as the application state itself.

Schema :

event source schema

The above schema should suffice for a typical event source . Lets understand what each column mean :

events : table name

eventId : unique id specifying an event (UUID)

entityId : id specifying an entity (Example :ORDER#0–0000028)

type : event type (Example : OrderCreated,OrderApproved,OrderShipped)

status : event status

createdTime : event published time

payload : payload of the event (a json value)

Also this schema keeps things simple . Systems work best when they are kept simple.

PS: Its very difficult to keep things simple , so keep it simple silly. :)

Now since our schema is finalized, its time to deep dive on the DynamoDB and PostgreSQL side of the event source.

1. The DynamoDB Approach :

Amazon DynamoDB is a fully managed proprietary NoSQL database service that supports key-value and document data structures offered by AWS.

Key Concepts :

  • Table
  • Items
  • Primary Key : Simple (partition key) ,Composite (partition key + sort key)
  • Attributes

The following represents the above key concepts :

DynamoDB representation of the event source schema

API Actions :

  • item-based : For Reading, Writing, Deleting an item in a table we have to specify the entire primary key (pk+sk) .
  • query : For querying we must provide the partition key and may optionally provide the sort key.
  • scan : scans the complete table , hence we should avoid this as much as possible especially if the data in the table is really big. This can incur lot of cost.

Secondary Indexes :

The challenges in DynamoDB comes especially during querying as compared to a RDBMS . Since we can query only using the primary key , if we want to query by any non-key attributes we have to create secondary indexes for them otherwise we will need to perform a scan which consumes RCUs(read capacity units) which increases the cost. There are two types for secondary indexes :

Local Secondary Index(LSI) : an index which contains the same partition key as the table but a different sort key is called a local secondary index. We can project the non key attributes in the index as well . By default, keys of the table and keys defined in the index are projected . LSI can only be created during table creation time . If we query for the attributes which are not projected into the index , then the value of the those are fetched from the main table. The LSI can incur following cost :

  • Storage Cost
  • Read Capacity Units (RCUs) : In case we query attributes which are not projected into the index.
  • Write Capacity Units (WCUs) : DynamoDB automatically synchronizes the index with the values from the table . So if we our use have frequent writes , then the synchronization will incur cost for WCUs.

Local secondary indexes inherit the read/write capacity mode from the base table. Hence, when defining the WCUs and RCUs we have to consider the LSI WCUs and RCUs as well. There is a limit of 10GB or less per partition.

Global Secondary Index (GSI) : such an index can contain either a simple primary key(pk) or a composite primary key (pk+sk).The data type of the key schema can only be a scalar type as defined by DynamoDB, we cannot create document types, list or map as keys for an index. GSI queries cannot fetch attributes from the base table ,hence we have to decide before hand and project those attributes to the index. By default the keys from the base table and keys from the index are projected. The GSI can incur following cost :

  • Storage Cost
  • Write Capacity Units (WCUs) : DynamoDB automatically synchronizes the index with the values from the table . So if we our use have frequent writes , then the synchronization will incur cost for WCUs.
  • Read Capacity Units (RCUs) : For querying data from the index.

GSI have there own defined WCUs and RCUs and don’t have any size limitations . Such indexes can be created any time not necessarily during table creation as with LSI.

Access Patterns :

The following table shows some of the possible access patterns and its secondary index mapping.

DynamoDB Access Patterns

**None : none for projection attributes mean that only the default columns are projected. In such a case when we query using LSI whatever non-key attributes we need will be fetched at that time and we will incur RCUs. We will also incur WCUs if we project too many columns and we have frequent writes .So whether to project all or few columns in the indices varies from case to case basis. We have to decide the trade off for our use case considering the storage costs and WCUs.

2. The PostgreSQL Approach :

PostgreSQL is a powerful, open source object-relational database system and with the introduction of Json types it now supports storing json format as well . So basically an RDBMS with NoSQL capabilities make it super awesome .Because of its support for Json types , it was considered as a candidate for event source.

Json Types :

PostgreSQL offers two types for storing JSON data : json and jsonb. The json and jsonb data types accept almost identical sets of values as input. The major practical difference is one of efficiency. The json data type stores an exact copy of the input text, which processing functions must reparse on each execution; while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed. jsonb also supports indexing, which can be a significant advantage.

Because the json type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast, jsonb does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept.

In our approach we will use jsonb types.

Indexes :

The following types of indexes are supported in postgres :

HashIndex ,B-Tree ,GIN ,BRIN ,GIST

Out of the above indexes , GIN is supported for jsonb types. The rest of the indexes operate on the any other column types. Without indexing the system has to scan the entire table row by row to find the matching entries. This is clearly an inefficient method. But if the system has been instructed to maintain an index on a column, it can use a more efficient method for locating matching rows. For instance, it might only have to walk a few levels deep into a search tree. As developers , we have to identify those columns which will be accessed frequently and create indexes accordingly. Since indexes also add overhead to the database system as a whole, so they should be used wisely.

Aurora PostgreSQL Pricing :

Aurora PostgreSQL is a managed version of ProgreSQL provided by AWS. The pricing includes the following :

  • Storage costs
  • Write IO : Write IOs are counted in 4KB units.
  • Read IO : IOs are input/output operations performed by the Aurora database engine against its SSD-based virtualized storage layer. Every database page read operation counts as one IO. The Aurora database engine issues reads against the storage layer in order to fetch database pages not present in the buffer cache. Each database page is 8KB in Aurora PostgreSQL.

Access Patterns :

The following table shows the possible access patterns. We will be using native queries using @Query annotation of hibernate for executing queries on jsonb types .

PostgreSQL Access Patterns

JPA does not support json or jsonb type directly , so the changes which is needed to done in order to store and retrieve data for such column types will be covered in a follow up post.

This was my two cents regarding designing an event source.!!!!!

________________________________________________________________

--

--

Ranjana Jha

Lead Software Engineer | Java8 | SpringBoot | Microservices | Cloud |Aws