(Natural) Evolution of Software Architecture

Published in

Multitude IT Labs

9 min readJan 17, 2024

As a software engineer, you must have heard of the buzz surrounding the microservices, the ongoing debate on monolithic architecture, and the ever-changing trends in system development. However, the reality of engineering is quite different from the idealized picture of seamless greenfield projects. Organizations evolve, and their architectures must adapt to changing needs.

In this article, I’ll delve into the natural evolution of systems, using examples from the banking industry. So, let’s embark on a journey into the dynamic world of financial technology, keeping in mind that choosing the right path is never easy. I’ll discuss the issues with the synchronous architecture, the usage of queues and streams, and their advantages.

The Beginnings…

Imagine this:

You enter a company where the architecture is made up of monolithic services that are interconnected in a synchronous dance. This architecture poses challenges such as scalability issues, complex change requests, and resilience.

Now, let’s dive into a real-life scenario straight from our client’s experience. The task at hand appears to be simple: apply for a loan. It starts with a web request to the Loan service, but the truth is that it’s far from easy. The Loan service needs to collaborate with the Client service to create a client, followed by the intricate dance of transferring money and performing accounting.

Though it may seem straightforward, we know that accounting in the financial industry is as challenging as solving a Rubik’s Cube blindfolded, with a multitude of intricate operations. The accounting team may mention a comfortable 60-second timeout for some of these operations, and if even one of the services fails, your dream of loan sales disappears faster than a paper boat in a monsoon. Not to mention the frustration of your clients waiting around 60 seconds after clicking the submit button. So, how do we solve the accounting dilemma? One might be tempted to implement a pulling mechanism, such as periodic checks of transaction completion, possibly every 5 seconds. But here’s the twist: don’t even consider it.

You certainly don’t want to accidentally turn your systems into a playground for DDoS attacks.

First steps with queues

Instead of relying on a direct approach, a better option is to implement a queue system. By using a reliable message queue such as Apache Kafka, you can easily notify clients about their confirmed payment status. They can either check later or receive a notification via email if everything runs smoothly. This approach may seem simple, but it can significantly improve the resilience of your architecture. However, it is important to note that this is not an event nor a push-based service.

Solution for slow service with the usage of the message queue

Business people are always looking for ways to increase their sales. This usually involves adding more sales channels, features, and services to their architecture. However, this can make the service logic more complex, increase traffic, and lead to transitive dependencies. Eventually, you will face similar issues with your Loan and Client Services that you have experienced with the Accounting Service. In the pursuit of better architecture, you might consider implementing the queue pattern everywhere. But beware, there is such a thing as “too much of a good stuff.” Sooner or later, you will realize that your enthusiasm has led you down a path of complexity and confusion.

Imagine this scenario:

You are surrounded by queues, using them extensively. Everything seems fine until a change request comes in, such as the addition of a new accounting fee. That is when the bubble bursts and you are faced with the daunting reality. You will need to make changes to each queue and ensure backward compatibility. Suddenly, your once-sleek architecture becomes a maze of complexity.

It is a situation that makes you stop and ask yourself, “Isn’t there a better way?”

Entity oriented vs Customer oriented systems

In the eyes of the clients, the organization of your systems is irrelevant. What truly matters to them is a hassle-free payment experience, a smooth onboarding process, and a clear view of their transactions.

Instead of introducing queues everywhere, a better approach is to add a layer that covers customer features. In the IT Architecture, any problem can be solved by adding a new layer. Right? RIGHT?

Let’s welcome new services in our new layer:

Onboarding — orchestrates the onboarding and does nothing else besides onboarding the customers. Thanks to that, you have strangled your Loan service of the onboarding logic.
Transaction Preview — is responsible for displaying the main information about the client transactions and clients themselves in their Web account
Payments — handles payments between the bank and the clients.

As a Software Engineer, you can see many benefits like:

Loan, Client, and Accounting services do not contain any business logic
Endpoints of the Loan service are performing changes only to the Loan Service without any other dependencies
Onboarding Service is a small and scalable microservice that can handle bigger traffic. If it is coded right, you don’t need to bother with the legacy code.
Separation of the application logic makes much more sense.
When your client just wants to see the transactions, which is the most used feature in the banking industry, you can have a maintenance window on your Onboarding and Payment Service.

However, this is where the complexity starts to unravel. The product owner might look at you and say: “Hold on, now I have to coordinate four teams instead of three when I need to make any change?”, and he will be right. With this approach, we have built an Upstream Dependency — one team is waiting for another team to finish their work before they can even start. This leads to the Ripple effect — simple change will disrupt functionality in multiple systems. The REST APIs are still in play, and dependencies among components are widespread. It becomes evident that we must address the question of data ownership.

This challenge demands a more profound solution. Managing customer features without a storage system is like trying to bake a cake without an oven — it simply won’t work. The answer is to bid farewell to the Entity Services and consolidate these entities within the Customer Features services. A crucial first step in this transformation is to assign clear ownership of the data. For the sake of simplicity, let’s imagine that customer and loan data fall under the ownership of the Onboarding service and the Payment service oversees the Transactions.

Reactivity begins

Maintaining synchronization across multiple services can be a difficult task. Trying to achieve this using the REST APIs can introduce several challenges. It becomes even more complex to ensure consistency of data across all services, especially when updates occur in a single service. Propagating these changes to other services can become a cumbersome and error-prone process.

To tackle these challenges, several strategies have been implemented:

Outbox Pattern: This approach involves creating an Outbox table, enabling you to store data in both tables within a single transaction. Simultaneously, a Change Data Capture (CDC) mechanism is employed to publish changes to a stream. This pattern ensures that the stream contains complete and consistent data, thus preventing data inconsistencies.

User Update Event (Listen to Yourself Pattern): In this pattern, updates are made directly to the stream. Services subscribe to the same stream, guaranteeing that any changes are synchronized across all relevant components. It’s an invaluable approach to the maintenance of the data consistency and to achieving cross-service synchronization.

Even with these two patterns, you can achieve push-based event architecture. You can implement also event sourcing, but it is much more complex. However, let's also briefly talk about Event Sourcing, which you can use, but it is not mandatory.

Event Sourcing: Event sourcing is a more intricate approach that merits deeper exploration. It involves storing every transaction as a log or a series of events, offering a detailed history of data changes.

Global Event Sourcing: In this scenario, events are logged across the entire system, providing a comprehensive view of data changes. Global event sourcing is often employed when a unified history of events across all services is needed, making it a powerful tool for auditing and comprehensive data analysis.

Local Event Sourcing: Local event sourcing captures events specific to a particular service or component. It’s valuable when a local history of events is required to support the functioning of an individual service.

By implementing one of these patterns, your architecture empowers clients to make payments and complete onboarding without relying on other systems. Moreover, these patterns ensure data consistency and synchronization, even in a distributed and complex system. The flexibility of event sourcing, whether global or local, allows you to choose the most appropriate approach for your specific requirements. This setup ultimately enables autonomous services that can be efficiently and horizontally scaled.

In this journey, the architecture now presents just one pull-based connection, which is the link between our WEB and Internet Banking Backend. However, technologies like WebSockets (WSS), Server-Sent Events (SSE), and GraphQL provide additional options. GraphQL defines subscriptions as first-class citizens, offering a means to trigger many subscriptions through mutations, essentially achieving a push-based architecture without the need for constant pulling. Thanks to the incorporation of these technologies (WSS, SSE, GraphQL), we’re able to implement our objective of implementing a Reactive Architecture across all our services.

Conclusion

When it comes to software systems, businesses often start with a monolithic structure. However, as their needs change, it becomes increasingly important to refactor and decompose this structure into more modular components. To achieve this, it’s essential to learn the key techniques like Change Data Capture (CDC) streams.

It is important to avoid splitting the monolith synchronously. Instead, a modular approach should be taken. If splitting is necessary, it should be done asynchronously. This approach requires careful consideration of how shared data is handled and often leads to the adoption of a reactive architecture. However, it’s important to weigh the benefits of reactivity against its costs. Not every aspect of the system requires streaming and reactivity, and it’s essential to think about the extent of reactivity needed. Going too far in this direction can result in an increased complexity in the DevOps and Infrastructure, so it’s important to strike a balance.

Summary

1. To achieve better scalability and massive throughput, it is important to always go from left to right, not the opposite.

2. If you decide to strangle your monolith, it is crucial to do it asynchronously.

3. It is important to avoid overusing of patterns that are suited for something else, as there is no one silver bullet for every case.

4. When starting with streams, it is crucial to establish clear data ownership.

In the ever-evolving world of the IT architecture, the journey of adaptation and evolution never truly ends. However, with the right strategies and thoughtful implementation, your systems can thrive. Your teams can also become independent, high-performing, provide seamless experiences to your clients, and achieve your business goals.