In the beginning, was the monolith (part 2) / database

Vadym Barylo
CodeX
Published in
8 min readOct 11, 2021

--

The database does not forgive mistakes. It is the single point of f̵r̵u̵s̵t̵r̵a̵t̵i̵o̵n̵ failure especially when complex monolith solution is bound to a single RDS instance.

It does not matter how experienced you are — your users always will find use cases or specific usage patterns that reveal SQL inefficiencies. Any SQL inefficiency will affect overall system performance.

After redesigning the main application to be represented as a set of independent macro-services with root application as a mediator, we achieved only the illusion of independence between domain boundaries. We still had a shared state between all.

This obviously impacting the reliability of the system, as inefficiency in one part affects the performance of the overall system, so users can experiencing a slowdown in business process execution.

Investing time in system redesign to become more cloud-friendly — we improved cloud characteristics like availability and maintainability but were still of the same grade in user satisfaction evaluation (reliability).

So our next milestone in macro-services adoption was the implementation of the “database per service” design pattern when each business service manages its own subset of data. Sounds easy enough, no? NO!

When design doesn't represent hidden complexity

You need to be very lucky to have a domain model that can be easily split into independent, non-intersecting blocks and managed by services that have no idea about the existence of other domain services.

We weren't lucky here.

I can’t reveal details of our domain model so will use a well-known e-commerce analogy.

Let’s imagine we have “product” and “user” vertically sliced domain contexts that live independently and are managed by independent services. UserDB doesn't store order info — it is used primarily to collect all user-related information — profile, permissions, capabilities, visiting history, etc. ProductDB focus is on inventory, logistics, and orders. But products need to store user links when orders are created, as well as users have to have all links to created orders and wishlists while browsing. Foreign key constraints are no longer available in this design, but some analog is required to ensure data consistency across different databases in different RDS.

Struggling in finding good enough solutions to cover these requirements, we started reviewing our use cases where data intersection was presented. All these use cases were classified by next core similarities:

  • input validation — during creating/updating EntityA you need to ensure that EntityB from a different domain (that is used in link) exists and is valid, e.g.
POST api/orders
{
productVariantId: 25,
userId: 2 //ensure user exists and is active
}
  • data enrichment — for internal data processing purposes some additional info is needed from a separate domain, e.g. calculate total order price taking into account user personal discount
GET api/order-book/1
...
{
userId: 2,
totalPrice: 200.0, //can differ between users with same items
orders: [...]
}

Input validation

Idea (and requirement) is to keep the whole system in a consistent state when data manipulation happens in independent domain contexts.

As we are still centralized by nature (and from a user perspective) and use a high-level centralized mediator to propagate business intent to the corresponding sub-domain services — we can use the upper layer to validate the whole input by utilizing data services for specific parts.

We implemented specific validators to ensure data from the presentation layer can’t be propagated to the next layer if validation is not passed.

As the presentation layer has full access to all domain context proxies — even heterogeneous requests (that utilizes data from different domains) can be fully analyzed first.

We can skip further processing if some error is found, and, at the same time we can continue processing using a sub-domain data service if all applied validators responded with success.

Example of validators:

class CreateOrderEvent {  @ValidUser //implemented as REST API validation against service1
private int userId;
@ValidProductVariant //service2 REST API validation implementation
private int productVariantId;
... other properties with basic validation like @Required ...
}

This solution was good enough from the short-term perspective as allows to use proxy data services with already trusted data because the solution ensures data is valid at the time of its use by lower layers. As business services haven’t been exposed outside — no one except the presentation layer can use them.

Other more complicated scenarios (like deleting user scenario) also can be covered by this approach — you need just more validation on the level above to reject execution at appropriate conditions at the level below (so users probably can’t be deleted if there are orders created by this user).

@ValidDeleteUserContext  //validator when execution context matter
class DeleteUserEvent {
@ValidUser
private int userId;
}

This approach was simple from an implementation perspective and strong enough to ensure the system is in a consistent state during data manipulation when multiple data sources are used.

Data enrichment

Idea (and requirements) is to support the concept of “shared data” and access it from many domain contexts.

This case is more complicated, and the root cause of it — domain context intersections because always exists shared data that must be included in many contexts. An example can be lookup data (e.g. countries) that can be used as reference data in many domain objects.

As we had several data storages for each domain context, we obviously needed a solution to access shared data that was stored in its own separate storage.

The traditional challenge here — shared data is not static and can be managed by its own business actions (e.g. list of countries can vary as the business globally expands).

Reviewing the number of use cases we found out that the only acceptable solution is to keep its own copy of shared data per each context.

We reviewed a couple of options to achieve consistency between data sources:

  • scheduled data sync — based on data variability we have CRON job to download shared data from the source and replace it at the destination
  • make data source observable to data changes (event streaming) — as we want to have fully dedicated and loosely coupled business domains, good to represent state mutation of each business domain object through the sequence of business CRUD events that can be available to consumers
  • implement outbox pattern by utilizing CDC (change data capture) database feature — similar to the solution above, but events are actual database row changes, but not corresponding business abstractions changes.

Scheduled data sync was rejected as an option because was much aggressive in use for database and network resources — you always work with a full set of data as you don’t know your diff. Also, the data consistency window was high enough because the solution was unaware of actual data use (CRON was build based on average data modification per some period, so anomaly of data modification was not considered, like weekdays and weekends activity).

The second reviewed approach is about making business service observable.

Once change happens — any interested party (even outside of the current domain) can react to this change. In our example — react to the CRUD event to make a copy of this entity and sync the current state.

This approach was accepted as a candidate because allowed also to improve services to be reactive by its nature and inform about state changes proactively. This design has only one serious hidden complexity — processing data against the internal data source and sending updates to message broker should be handled in the single distributed transaction —if only one action is completed while processing user intent (e.g. stored in DB, but message not sent) then the full cluster is at risk in falling to inconsistent state.

So we rejected it in favor of the third option — database-level state change observability.

Estimated work was also a bit lower compared to the second approach as the solution becomes simpler and aims to solve the same problem without implementing the handmade pub-sub infrastructure and without distributed transactions in its design. It is based on KafkaStreams and Debezium connector on top of RDS support of transmitting data changes. Idea is to react to native database change events and use KafkaStream connectors to properly serialize these changes into appropriate Kafka events and transmit over a broker to the destination database.

As a result — changed data delivered silently to destination without utilizing corresponding services that own these data sources. Also, most important, the sink and source connector allows some level of control over these changes and supports some business action on top of it (e.g. use only sub-set of fields for destination or filter events by destination purpose).

As a result, services decoupling and storages decoupling gives us more freedom to control each individual service more accurately. As well performance issues already were possible to isolate in the specific components without harming full system.

At the end of this exercise, we were very close to finally split our solution vertically (including the presentation layer), to become fully distributed, manageable independently, and with our own independent release life cycle. So we started the adoption of micro-frontend architecture for the complete vertical splitting of our solution. I will share details in the next post.

Lessons learned:

  • use a single database for all microservices is anti-pattern — your performance issues or SQL inefficiencies just moved to the lower layer, but in general effect still the whole solution
  • domain thinking is good, but shared data anyway exists — the preferred solution is to merge them close to its normal use
  • use the single place for data management for common data
  • use common data as “read replica” on particular domains, don’t allow to modify it
  • CDC is a good alternative to manual data-sync process, easy to implement, and easy to extend/support as the market provides a good range of open-source solutions

--

--